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14.  ABSTRACT 
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cancers. 

During  this  project  year,  we  have  performed  the  following  tasks:  (1)  continue  to  collect  the  data  sets  of  digitized  film  mammograms  for  testing  our  CAD 
system,  (2)  investigation  of  a  bilateral  approach  to  reduce  the  false  positives  (FPs)  on  single  CAD  system,  (3)  develop  image  processing  techniques  for 
improvement  of  mass  detection  on  prior  mammograms,  and  (4)  continue  to  develop  a  two-view  information  fusion  method  to  improve  the  performance  of 
single  CAD  system. 

In  summary,  we  have  investigated  a  number  of  areas  in  CAD  of  mammographic  masses  and  evaluated  the  new  techniques  for  mass  detection  on 
mammograms.  We  have  made  progress  in  three  of  the  tasks  proposed  in  the  project.  We  have  found  that  our  new  computer-vision  techniques  can 
improve  the  performance  of  the  CAD  systems.  We  will  continue  the  development  of  the  CAD  system  in  the  coming  years. 
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(4)  Introduction 

Recent  clinical  studies  have  proved  that  computer-aided  diagnosis  (CAD)  systems  are 
helpful  for  improving  cancer  detection  by  radiologists  on  mammograms1'6.  To  evaluate  the 
effectiveness  of  a  CAD  system  in  detecting  cancers  that  are  likely  to  be  missed  by  radiologists, 
one  way  is  to  study  its  accuracy  in  detecting  missed  cancers  on  prior  mammograms  (the 
mammograms  in  previous  exams  on  which  the  cancer  can  be  seen  retrospectively).  Several 
studies  have  demonstrated  that  CAD  systems  have  potential  ability  to  detect  missed  cancers  on 
prior  mammograms7'11.  However,  the  performance  of  a  CAD  system  on  prior  mammograms  is 
generally  much  lower  than  their  performance  on  the  current  mammograms  (the  mammogram  on 
which  cancer  is  detected).  Recently,  one  study  investigated  the  performance  change  between 
prior  mammograms  and  current  mammograms  when  using  the  CAD  system  trained  by  current 
mammograms  and  another  by  prior  mammograms.  It  was  concluded  that  CAD  schemes  trained 
with  the  current  mammograms  do  not  perform  optimally  in  detecting  masses  depicted  on  prior 
images  and  vice  versa. 

The  goal  of  this  proposed  project  is  to  develop  a  CAD  system  using  advanced  computer 
vision  techniques  to  detect  masses  using  retrospectively  detected  cancers  on  prior  mammograms 
and  incorporate  the  developed  CAD  system  into  our  current  CAD  system.  We  hypothesize  that  a 
dual  CAD  system,  which  combines  a  system  trained  with  subtle  lesions  retrospectively  seen  on 
prior  mammograms  and  a  system  trained  with  cancers  detected  on  current  mammograms,  should 
increase  the  sensitivity  of  detecting  cancers  at  the  early  stage  without  compromising  its  ability  to 
detect  less  subtle  cancers.  To  accomplish  this  goal,  we  will  (1)  collect  a  large  database  of 
masses  on  digitized  prior  and  current  film  mammograms  (DFMs)  for  training  and  testing  the 
CAD  system,  (2)  develop  single-view  computer  vision  techniques  for  mass  detection  and 
classification  in  prior  DFMs,  (3)  reduce  false  positives  (FPs)  by  correlation  of  image  information 
from  two-view  mammograms,  (4)  combine  the  new  CAD  system  with  our  current  CAD  system 
without  an  increase  in  overall  FPs,  and  (5)  perform  ROC  study  to  evaluate  the  effects  of  CAD  on 
radiologists’  accuracy  in  detecting  subtle  cancers.  Although  we  do  not  plan  to  develop  such  a 
system  for  digital  mammograms  because  there  will  not  be  enough  prior  digital  mammograms 
with  cancers  available  for  the  development,  the  general  methodology  developed  in  this  study  can 
be  adapted  to  CAD  systems  for  digital  mammograms  in  the  future. 

At  the  conclusion  of  this  project,  we  expect  that  a  fully  automated  CAD  system  will  be 
developed  which  can  be  used  for  detection  of  masses  on  DFMs.  The  general  methodology 
developed  in  this  study  may  also  be  adapted  to  develop  similar  software  for  other  CAD  systems. 
The  significance  of  this  project  is  that  it  will  develop  a  CAD  system  which  can  further  improve 
radiologists’  accuracy  in  detecting  breast  cancers  at  an  early  stage.  Since  early  detection  and 
treatment  can  reduce  breast  cancer  mortality  rate,  the  CAD  system  will  be  useful  for  increasing 
the  effectiveness  of  mammographic  screening. 
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(5)  Body 

The  current  year  (6/1/06-5/31/07)  is  the  third  year  of  the  project.  We  have  requested  and 
obtained  approval  for  a  no  cost  time  extension  of  the  project  so  that  this  is  a  regular  annual 
progress  report  instead  of  a  final  report.  We  will  describe  in  the  following  details  of  the  studies 
that  we  perfonned  this  year. 

(A)  Collection  of  a  Database  of  Digitized  Screen-film  Mammograms  (DFM)  with 

Multiple  Examinations 

In  this  project  year,  we  continue  to  collect  a  data  set  of  digitized  screen-film 
mammogram  from  patient  files  in  the  Department  of  Radiology  at  the  University  of  Michigan 
with  Institutional  Review  Board  (IRB)  approval.  Two  independent  data  sets  of  mammograms 
were  collected  for  this  study;  one  contained  mammograms  with  masses  and  the  other  contained 
nonnal  mammograms.  The  normal  data  set  was  used  to  estimate  the  false  positive  (FP)  marker 
rates  during  testing12'14.  To  date,  the  mass  data  set  contained  220  cases  with  220  masses.  190 
cases  included  the  current  mammograms  on  which  the  mass  was  detected  by  radiologists,  and  the 
prior  mammograms  obtained  from  previous  exams.  30  cases  only  had  the  current  mammograms. 
In  total,  886  mammograms  including  440  current  mammograms  and  446  prior  mammograms 
were  collected.  The  true  location  of  each  mass  was  identified  by  an  experienced  Mammography 
Quality  Standards  Act  (MQSA)  radiologist.  The  radiologist  also  measured  the  mass  size  and 
provided  descriptions  of  the  mass  margin,  shape,  conspicuity,  and  breast  density. 

(B)  Investigation  of  a  Bilateral  Approach  to  Reduce  the  FPs  on  Single  CAD  System 

In  an  effort  to  improve  the  performance  of  our  single  CAD  system,  the  first  study  of  this 
year  is  to  investigate  an  FP  reduction  method  based  on  analysis  of  bilateral  mammograms  for 
computerized  mass  detection  systems.  Our  recent  paper  has  been  accepted  for  publication  on  the 
Medical  Physics  Journal15.  The  study  is  summarized  in  the  following. 

1.  Data  Set 

A  database  of  mammograms  was  collected  from  patient  files  at  the  Department  of  Radiology 
with  Institutional  Review  Board  (IRB)  approval.  Two  data  sets  are  used:  a  mass  data  set 
containing  bilateral  digitized  mammograms  with  malignant  or  benign  masses  and  a  no-mass  data 
set  containing  bilateral  digitized  mammograms  without  masses,  verified  by  an  experienced 
radiologist.  All  cases  had  four  mammographic  views,  the  CC  view  and  the  MLO  view 
mammogram  for  both  breasts.  The  mass  set  contained  276  cases  so  that  552  bilateral  pairs  were 
available.  The  no-mass  data  set  contained  65  cases  so  that  130  bilateral  pairs  were  available. 
Fifty  cases  of  the  no-mass  set  were  consecutive  nonnal  screening  cases  from  our  patient  files 
with  an  additional  15  cases  visually  judged  by  radiologists  to  be  dense  breasts.  The  mass  data 
set  was  used  to  estimate  the  detection  sensitivity  and  the  no-mass  data  set  was  used  for 
estimating  the  FP  rate.  In  the  mass  data  set,  each  patient  had  a  biopsy-proven  mass  in  one  of  the 
breasts,  resulting  in  a  total  of  276  masses,  166  of  which  were  benign  and  110  malignant.  An 
MQSA  radiologist  identified  the  location  of  the  masses,  measured  the  mass  sizes  as  the  longest 
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dimension  seen  on  the  two-view  mammograms,  provided  descriptors  of  the  mass  shapes  and 
mass  margins,  and  also  provided  an  estimate  of  the  breast  density  in  term  of  BI-RADS  category. 

2.  Methods 

In  order  to  improve  the  performance  of  our  CAD  system,  we  developed  a  new  bilateral  CAD 
system  that  combines  the  unilateral  features  with  the  bilateral  features  to  reduce  FPs.  Our 
bilateral  CAD  system  consists  of  five  steps:  (1)  mass  candidate  (MC)  detection,  (2) 
corresponding  ROIs  (CR)  extraction,  (3)  feature  analysis,  (4)  feature  combination,  and  (5) 
bilateral  CAD  system  generation.  Figure  1  shows  the  block  diagram  for  our  bilateral  CAD 
system. 


Figure  1 .  The  block  diagram  of  the  bilateral  CAD  system  for  FP  reduction  on  mammograms. 


The  mass  candidates  on  the  individual  mammograms  are  detected  by  the  prescreening  step  of 
our  unilateral  CAD  system16,17.  A  gradient  field  analysis  is  applied  to  the  mammogram  and  the 
locations  of  high  gradient  convergence  are  identified  as  mass  candidates.  An  ROI  of  256x256 
pixels  is  then  centered  at  each  location  of  high  gradient  convergence.  For  each  candidate, 
regional  registration  technique16'19  is  used  to  define  an  ROI  that  is  “symmetrical”  to  the  object 
location  on  the  contralateral  mammogram.  An  ROI  of  256x256  pixels  is  then  centered  at  the 
triangle  as  the  contralateral  ROI.  For  the  feature  analysis,  SGLD  texture  features  and 
morphological  features  are  extracted  from  both  the  ROIs  containing  the  detected  mass  candidate 
and  its  contralateral  ROI.  Let  MC[i,j ]  and  CR[i,j]  be  the  ith  feature  of  the  jth  mass  candidate 
and  the  ith  feature  of  the  jth  corresponding  ROI,  respectively.  The  ith  bilateral  feature  (BF)  is 
derived  from  the  ith  unilateral  feature  by  the  expression  below. 
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(1) 


BF(j  Max  (MC[i,j],  CR[i,  j] ) 

Min  (MC[i,  j],  CR[i,j] ) 

Using  the  bilateral  features  as  the  input  predictor  variables,  a  linear  discriminant  analysis  (LDA) 
classifier  is  trained  to  merge  the  features  into  a  bilateral  score.  The  LDA  is  trained  with  a  leave- 
one-case-out  resampling  scheme  for  stepwise  feature  selection  within  the  training  set.  The 
bilateral  score  incorporates  the  “symmetry”  information  to  differentiate  symmetric  (likely  FPs) 
and  asymmetric  (likely  masses)  tissues  on  the  left  and  right  breasts.  To  merge  the  infonnation 
from  the  unilateral  and  the  bilateral  features  in  our  CAD  system,  a  new  feature  space  is  formed 
by  combining  the  unilateral  features  and  the  bilateral  score  of  the  mass  candidate.  Finally,  the 
bilateral  CAD  system  output  score  is  obtained  by  a  second  LDA  classifier  that  is  trained  to 
differentiate  the  true  mass  and  FPs  in  the  new  feature  space,  again  using  the  training  set. 


3.  Results 


(a)  (b) 

Figure  2.  (a)  Image-based  and  (b)  case-based  average  test  FROC  curves  from  the  unilateral  and 
the  bilateral  CAD  systems.  The  FP  rates  were  estimated  from  detection  on  mammograms  in  the 
test  subsets  with  masses. 


Figure  2  shows  the  average  free  response  receiver  operating  characteristic  (FROC)  curves 
for  the  test  sets  using  the  unilateral  and  bilateral  CAD  systems.  The  the  corresponding  trained 
LDA  classifiers  have  been  used  for  FP  reduction  and  the  FP  rates  were  estimated  from  the  test 
subsets  with  masses.  The  bilateral  CAD  system  achieved  a  case-based  sensitivity  of  70%,  80%, 
and  85%  at  average  FP  rates  of  0.53,  0.87,  and  1.15  FPs/image,  respectively,  on  the  test  data  set. 
In  comparison  to  the  average  FP  rates  for  the  unilateral  CAD  system  of  0.70,  1.11,  and  1.46 
FPs/image,  respectively,  at  the  corresponding  sensitivities,  the  FP  rates  were  reduced  by  24%, 
21%,  and  21%  with  the  bilateral  symmetry  infonnation.  Figure  3  shows  the  average  test  FROC 
curves  for  the  unilateral  and  bilateral  CAD  systems  with  the  FP  rates  estimated  on  the  set  of  no¬ 
mass  mammograms.  Figure  4  compares  the  average  test  FROC  curves  for  the  unilateral  and 
bilateral  CAD  systems  on  malignant  cases  only.  Figure  5  shows  the  average  test  FROC  curves 
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for  the  unilateral  and  bilateral  CAD  systems  with  the  sensitivities  estimated  on  malignant  cases 
only  and  the  FP  rates  estimated  on  the  set  of  no-mass  mammograms.  We  employed  the  jackknife 
FROC  (JAFROC)  analysis23,24  to  evaluate  the  difference  in  the  FROC  curves  obtained  from  the 
unilateral  CAD  system  and  the  bilateral  CAD  system.  The  difference  between  the  figures  of 
merit  (FOMs)  for  the  unilateral  and  the  bilateral  CAD  systems  was  statistically  significant 
{p<  0.05). 


Number  of  False  Positives  per  Case 


(a)  (b) 

Figure  3.  (a)  Image-based  and  (b)  case-based  average  test  FROC  curves  from  the  unilateral 
and  the  bilateral  CAD  systems.  The  FP  rates  were  estimated  from  detection  on 
mammograms  in  the  no-mass  data  set. 


4.  Discussion  and  Conclusion 


Symmetry  between  breast  structures  in  bilateral  pairs  of  mammograms  is  an  important 
feature  used  by  radiologists  for  mass  detection  or  FP  reduction.  Similar  structures  that  appear 
in  both  the  right  and  left  mammograms  are  more  likely  to  be  normal  tissue  than  abnormal 
lesions.  Our  bilateral  analysis  translates  this  human  intelligence  to  computer  vision  that  can 
recognize  the  symmetry  of  breast  tissue  on  bilateral  mammograms  to  improve  detection 
accuracy.  To  our  knowledge,  this  FP  reduction  strategy  for  mass  detection  has  not  been 
reported  previously.  Our  results  demonstrate  that  the  bilateral  features  can  be  utilized  to 
differentiate  the  similarity  and  dissimilarity  between  tissues  at  corresponding  locations  in  the 
bilateral  views,  and  can  be  useful  for  improving  the  performance  of  a  unilateral  CAD  system  by 
further  reducing  the  FPs. 


(a)  (b) 

Figure  4.  (a)  Image-based  and  (b)  case-based  average  test  FROC  curves  from  the  unilateral  and 
bilateral  CAD  systems  for  detection  on  cases  with  malignant  masses  only.  The  FP  rates  were 
estimated  from  the  test  subset  with  masses. 


(a)  (b) 

Figure  5.  (a)  Image-based  and  (b)  case-based  average  test  FROC  curves  from  the  unilateral  and 
bilateral  CAD  systems  for  detection  on  cases  with  malignant  masses  only.  The  FP  rates  were 
estimated  from  the  no-mass  data  set. 
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(C)  Development  of  Image  Processing  Techniques  for  Improvement  of  Mass  Detection 
on  Prior  Mammograms 

The  second  study  of  this  year  is  to  develop  image  processing  techniques  for  improvement 
of  CAD  on  prior  mammograms.  Our  results  were  presented  at  the  SPIE  meeting  in  200720.  Also, 
these  techniques  have  been  applied  to  mass  detection  on  full  field  digital  mammograms  (FFDM) 
and  screening  film  mammograms  (SFM).  We  found  that  they  were  useful  for  improving  the 
accuracy  of  mass  detection  on  current  mammograms  as  well.  A  journal  article  with  the  results 
had  been  published  in  Academic  radiology  200721.  The  study  is  summarized  in  the  following. 

1.  Data  Set 


1  2  3  4  5  6  7  8  9  10  INV 

obvious  Mass  Visibility  subtle 


(a)  mass  sizes 


(b)  mass  visibility 


Figure  6.  Histogram  of  the  sizes  and  visibility  for  299  masses  on  current  mammograms  and  301 
masses  on  priors  in  our  data  set.  The  size  of  the  masses  in  this  data  set  ranged  from  3  to  42  mm. 
The  visibility  is  evaluated  on  a  10-point  rating  scale  with  1  representing  the  most  visible  masses 
and  10  the  most  difficult  case  relative  to  the  cases  seen  in  their  clinical  practice.  The  masses  that 
were  not  visible  were  plotted  in  the  column  labeled  as  “INV”. 


All  mammograms  in  this  study  were  collected  from  patient  files  in  the  Department  of 
Radiology  at  the  University  of  Michigan  with  Institutional  Review  Board  (IRB)  approval.  The 
mammograms  were  digitized  with  a  LUMISYS  85  laser  film  scanner  with  a  pixel  size  of 
50um><50um  and  4096  gray  levels.  The  full  resolution  mammograms  were  first  smoothed  with  a 
2x2  box  filter  and  subsampled  by  a  factor  of  2,  resulting  in  images  with  a  pixel  size  of 
100/nnxl00//m.  These  images  were  used  for  input  to  our  CAD  system.  The  data  set  we  used  in 
this  study  contained  159  cases.  Each  exam  had  two  mammographic  views,  resulting  in  a  total  of 
318  current  mammograms  and  402  prior  mammograms.  Forty-two  patients  had  two  years  of 
prior  examinations.  All  mammograms  were  obtained  before  biopsy.  There  were  159  biopsy- 
proven  masses  in  this  data  set.  Figures  6  showed  the  histograms  of  mass  sizes  and  visibility, 
respectively,  for  the  comparison  of  current  and  prior  masses.  The  size  of  a  mass  was  estimated 
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as  its  longest  diameter  seen  on  the  mammograms.  The  visibility  of  the  masses  was  rated  by  an 
experienced  radiologist  on  a  10-point  scale  with  1  representing  the  most  visible  masses  and  10 
the  most  difficult  case  relative  to  the  cases  seen  in  clinical  practice.  The  mass  size  ranged  from  3 
to  42  mm  (mean  size:  14.3±8.6  mm  on  current  mammograms  and  10.9±6.6.  mm  on  prior 
mammograms)  and  the  visibility  ratings  extended  over  the  entire  range.  For  the  current 
mammograms,  140  of  the  masses  were  visible  on  both  views  and  19  visible  on  only  one  view. 
For  the  prior  mammograms,  100  masses  were  visible  on  both  views  and  101  visible  only  on  one 
view.  Therefore,  there  were  299  visible  and  19  invisible  masses  on  current  mammograms  and 
301  visible  and  101  invisible  masses  on  prior  mammograms  if  the  masses  were  counted 
independently  by  mammographic  view. 

2.  Methods 

Our  CAD  system  consists  of  five  processing  steps:  1)  pre-screening  of  mass  candidates, 
2)  identification  of  suspicious  objects,  3)  extraction  of  morphological  and  texture  features,  and  4) 
classification  between  the  nonnal  and  the  abnormal  regions  by  using  rule-based  and  LDA 
classifiers. 

For  the  pre-screening  stage,  we  developed  a  new  prescreening  technique  in  which 
gradient  field  analysis  was  combined  with  Hessian  analysis  to  identify  mass  candidates.  Both 
gradient  field  and  Hessian  analyses  were  designed  to  enhance  approximately  circular  structures 
on  mammograms  and  to  suppress  the  objects  with  other  shapes.  Gradient  field  analysis  used  the 
information  of  gradient  field  directions  and  Hessian  analysis  used  the  second  derivatives  by 
solving  for  the  eigenvalues  of  the  Hessian  matrix.  After  this  enhancement  filtering,  the  local 
maxima  within  the  breast  region  were  identified  as  the  mass  candidates  on  each  mammogram. 
The  suspicious  structure  in  each  identified  location  was  initially  extracted  by  a  seed-based  region 
growing  method.  An  active  contour  method  was  then  used  to  further  refine  the  initial 
segmentation.  Morphological,  gray  level  histogram  and  run-length  statistics  (RLS)  features  were 
extracted  from  the  original  region  of  interest  (ROI)  and  the  orientation  field  of  the  ROI  for 
reduction  of  FPs. 

The  hold-out  method  was  used  for  training  and  testing  our  CAD  system.  We  randomly 
separated  the  entire  data  set  by  case  into  two  independent  subsets,  the  training  subset  including 
78  cases  with  156  current  and  200  prior  mammograms  and  the  test  subset  including  81  cases 
with  162  current  and  202  prior  mammograms.  The  training  included  selection  of  proper 
parameters  and  features  for  the  classifier  in  the  CAD  system.  Once  the  training  was  completed, 
the  parameters  and  features  were  fixed  for  testing.  The  new  system  was  trained  by  using  prior 
mammograms  in  the  training  set  only.  The  perfonnance  of  the  new  system  was  compared  with 
that  of  the  previous  CAD  system  on  the  current  and  prior  mammograms  in  the  test  set. 

During  training,  feature  selection  with  stepwise  LDA  was  employed  to  obtain  the  best 
feature  subset  and  reduce  the  dimensionality  of  the  feature  space  to  design  an  effective  classifier. 
The  detailed  procedure  has  been  described  elsewhere17.  Briefly,  at  each  step  one  feature  was 
entered  or  removed  from  the  feature  pool  by  analyzing  its  effect  on  the  selection  criterion,  which 
was  chosen  to  be  the  Wilks'  lambda  in  this  study.  Since  the  appropriate  threshold  values  for 
feature  entry,  feature  elimination,  and  tolerance  of  feature  correlation  were  unknown,  we  used  an 
automated  simplex  optimization  method  to  search  for  the  best  combination  of  thresholds  in  the 
parameter  space.  The  simplex  algorithm  used  a  leave-one-case-out  resampling  method  within 
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the  training  subset  to  select  features  and  estimate  the  weights  for  the  LDA  classifier.  To  have  a 
figure-of-merit  to  guide  feature  selection,  the  test  discriminant  scores  from  the  left-out  cases 
were  analyzed  using  receiver  operating  characteristic  (ROC)  methodology.  The  accuracy  for 
classification  of  masses  and  FPs  was  evaluated  as  the  area  under  the  ROC  curve,  Az.  In  this 
approach,  feature  selection  was  performed  without  the  left-out  case  so  that  the  test  performance 
would  be  less  optimistically  biased.  However,  the  selected  feature  set  in  each  leave-one-case-out 
cycle  could  be  slightly  different  because  every  cycle  had  one  training  case  different  from  the 
other  cycles.  In  order  to  obtain  a  single  trained  classifier  to  apply  to  the  hold-out  test  subset,  a 
final  stepwise  feature  selection  was  perfonned  with  the  best  combination  of  thresholds,  found  in 
the  simplex  optimization  procedure,  on  the  entire  training  subset  to  obtain  the  final  set  of 
features  and  estimate  the  weights  of  the  LDA.  Note  that  the  entire  process  of  feature  selection 
and  classifier  weight  estimation  was  performed  within  the  training  subset.  The  LDA  classifier 
with  the  selected  feature  set  was  then  fixed  and  applied  to  the  test  subset. 

3.  Results 


Figure  7.  Image-based  test  FROC  curves  on 
prior  mammograms.  Old  CAD:  detection  by 
the  previous  CAD  system  trained  on  both 
current  and  prior  mammograms.  New  CAD: 
detection  by  the  CAD  system  trained  on  prior 
mammograms. 
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Figure  8.  Case-based  test  FROC  curves  on 
prior  mammograms.  Old  CAD:  detection  by 
the  previous  CAD  system  trained  on  both 
current  and  prior  mammograms.  New  CAD: 
detection  by  the  CAD  system  trained  on  prior 
mammograms. 


Figures  7  and  8  showed  the  image-based  and  case-based  FROC  curves  for  detection  of 
masses  on  prior  mammograms,  respectively.  The  case-based  sensitivities  for  detection  of  masses 
on  the  prior  mammograms  (typically  subtle  masses)  in  the  test  subset  were  56%,  and  35%  at  1 
and  0.5  FPs/image  by  using  the  new  CAD  system  in  comparison  to  48%,  and  32%  at  the  same 
FP  rates  by  using  the  previous  CAD  system.  The  improvement  with  the  new  system  on  prior 
mammograms  was  statistically  significant  (p  =  0.036).  When  the  new  system  was  applied  to  the 
detection  of  masses  on  the  current  mammograms  (typically  average  masses)  in  the  test  subset, 
the  case-based  sensitivities  were  77%  and  70%  at  1  and  0.5  FPs/image  in  comparison  to  75% 
and  56%  at  the  same  FP  rates  by  using  the  previous  CAD  system.  The  difference  in  the  two 
FROC  curves  for  detection  of  average  masses  on  current  mammograms  was  not  statistically 
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different  (p  =  0.184 )  by  JAFROC  analysis.  Image-based  and  case-based  FROC  curves  for 
detection  of  masses  on  current  mammograms  were  shown  in  Figures  9  and  10,  respectively. 


Number  of  False  Positives  per  Image 

Figure  9.  Image-based  test  FROC  curves  on 
current  mammograms.  Old  CAD:  detection  by 
the  previous  CAD  system  trained  on  both 
current  and  prior  mammograms.  New  CAD: 
detection  by  the  CAD  system  trained  on  prior 
mammograms. 


Number  of  False  Positives  per  Image 


Figure  10.  Case-based  test  FROC  curves  on 
current  mammograms.  Old  CAD:  detection 
by  the  previous  CAD  system  trained  on  both 
current  and  prior  mammograms.  New  CAD: 
detection  by  the  CAD  system  trained  on  prior 
mammograms. 


4.  Discussion  and  Conclusion 

In  this  study,  we  improved  the  accuracy  of  a  CAD  system  for  detection  of  subtle  masses 
on  prior  mammograms.  A  new  prescreening  method  was  developed  to  improve  the  sensitivity  of 
mass  detection.  A  new  mass  segmentation  method  that  combined  a  seed-based  region  growing 
method  with  active  contour  method  was  also  designed.  RLS  features  were  extracted  from  the 
original  ROIs  and  the  newly  derived  orientation  field  of  the  ROIs  for  FPs  reduction.  Our  CAD 
system  can  significantly  improve  the  performance  of  mass  detection  on  prior  mammograms 
without  a  trade-off  in  the  detection  of  masses  on  current  mammograms.  It  is  expected  that  the 
new  CAD  system  can  increase  the  overall  accuracy  for  detection  of  subtle  early-stage  breast 
cancers. 
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(D)  Continuation  of  Development  of  a  Two-view  Information  Fusion  Method  to 
Improve  the  Performance  of  Single  CAD  System 

The  third  study  performed  in  this  project  year  is  to  continue  to  develop  a  two-view 
information  fusion  method  to  improve  the  perfonnance  of  our  CAD  system  for  mass  detection. 
Our  results  were  presented  at  the  RSNA  in  Chicago22  in  November  of  2006.  We  have  made  good 
progress  on  this  part  and  are  preparing  a  journal  paper.  The  study  is  summarized  in  the 
following. 

1.  Data  Set 

All  mammograms  in  this  study  were  collected  from  patient  files  in  the  Department  of 
Radiology  at  the  University  of  Michigan  with  Institutional  Review  Board  (IRB)  approval.  In 
this  study,  two  data  sets  were  collected:  a  mass  set  with  biopsy-proven  unilateral  malignant  or 
benign  masses  and  a  normal  set  containing  bilateral  mammograms.  The  mass  set  contained  469 
cases  with  469  biopsy  proven  masses,  of  which  190  were  malignant  and  279  benign.  Each  case 
contained  two  mammographic  views  (CC  view  and  MLO  view  or  the  lateral  view).  The  normal 
set  was  consisted  of  50  consecutive  normal  screening  cases  from  our  patient  files  and  an 
additional  15  cases  visually  judged  by  radiologists  to  be  dense  breasts.  Each  normal  case 
contained  4  mammographic  views  from  bilateral  breasts.  The  nonnal  data  set  was  only  used  for 
estimating  the  FP  rate  during  testing.  An  experienced  MQSA  radiologist  identified  the  locations 
of  masses  by  examining  all  available  information  including  the  diagnostic  mammograms  and 
reports.  In  these  469  mass  cases,  19  masses  (4%)  can  be  seen  only  on  one  mammographic  view. 

2.  Methods 


In  order  to  improve  the  overall  performance  of  our  CAD  system  for  detection  of  masses, 
we  developed  a  two-view  fusion  technique  which  combines  the  infonnation  from  two 
mammographic  views.  Our  method  in  this  study  is  based  on  two  assumptions:  (1)  the 
corresponding  true  masses  on  two  different  mammographic  views  will  exhibit  higher  similarity 
than  the  FPs  detected  by  the  CAD  system,  (2)  the  morphological  and  texture  features  of  the  same 
mass  on  different  views  will  also  show  similar  properties  and  mass  pairs  (TP-TP  pairs)  can  be 
distinguished  from  false  pairs  (TP-FP,  FP-TP,  FP-FP  pairs)  in  the  combined  feature  space  from 
two  different  mammographic  views.  A  schematic  of  our  two-view  system  is  shown  in  Figure  11. 
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Figure  1 1 .  Schematic  diagram  of  our  two-view  CAD  system  for  mass  detection  on 
mammograms.  The  system  is  developed  for  screening  mammography  in  which  all  masses, 
regardless  of  malignant  or  benign,  are  considered  positive. 


Figure  12.  Block  diagram  of  the  two-view  information  fusion  for  suspicious  objects  on  CC 
and  MLO  views  of  the  same  breast. 
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The  key  process  of  our  two-view  CAD  system  is  the  information  fusion  step  in  which  the 
potential  similar  suspicious  objects  on  different  mammographic  views  are  paired  together  and  a 
classifier  merges  the  two-view  information  and  provides  a  unique  fusion  score  for  each 
individual  suspicious  object.  For  a  deformable  object  like  the  breast  under  compression,  the 
corresponding  locations  of  an  object  in  the  two  views  cannot  be  determined  exactly  based  on  the 
two  projection  mammograms.  Our  two-view  information  fusion  scheme  consists  of  four  steps: 
(1)  region  registration  by  using  geometric  infonnation,  (2)  similarity  measure  between  paired 
objects,  (3)  classification  of  TP-TP  pairs  from  other  FP  pairs,  and  (4)  generalization  of  fusion 
score.  Figure  12  shows  the  block  diagram  for  two-view  information  fusion  for  suspicious 
objects  on  CC  and  MLO  views  of  the  same  breast.  The  JAFROC  method  was  used  to  compare 
the  perfonnance  of  our  two-view  CAD  system  with  that  of  single  CAD  system  statistically. 


3.  Results 

We  randomly  separated  the  cases  in  our  mass  data  set  into  two  independent  data  sets:  238 
cases  with  472  images  and  23 1  cases  with  462  images.  The  training  and  testing  were  performed 
using  the  2-fold  cross  validation  method.  The  detection  performance  of  the  CAD  system  was 
assessed  by  FROC  analysis.  FP  rate  was  estimated  from  the  mammograms  without  masses. 
FROC  curves  were  presented  on  a  per-mammogram  and  a  per-case  basis.  To  evaluate  the 
overall  test  performance,  an  average  test  FROC  curve  was  obtained  as  described  above.  Figures 
13(a)  and  13(b)  show  the  comparison  of  the  test  performance  of  the  single-view  CAD  system 
and  the  two-view  CAD  systems  by  using  image-based  and  case-based  average  FROC  curves, 
respectively.  The  single-view  CAD  system  achieved  an  FP  rate  of  2.3,  1.7,  and  1.3  FPs/image  at 
the  case-based  test  sensitivities  of  90%,  85%  and  80%,  respectively.  With  the  two-view  fusion 
system,  the  FP  rates  were  reduced  to  1.9,  1.4,  and  1.1  FPs/image  at  the  corresponding 
sensitivities,  respectively.  The  improvement  was  found  to  be  statistically  significant  (p<0.05)  by 
the  JAFROC  method. 


Number  of  False  Positives  per  Image 


Number  of  False  Positives  per  Image 


(a)  (b) 

Figure  13.  (a)  Image-based  and  (b)  case-based  average  test  FROC  curves  from  the  single -view 
and  the  two-view  CAD  systems.  The  FP  rates  were  estimated  from  detection  on  mammograms  in 
the  no-mass  data  set. 
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4. 


Discussion  and  Conclusion 


In  this  study,  a  two-view  information  fusion  method  was  developed  to  improve  the 
performance  of  our  CAD  system  for  mass  detection  on  mammograms.  The  two-view  CAD 
system  is  different  from  case-based  scoring,  in  which  detection  of  the  same  mass  in  either  the  CC 
view  or  the  MLO  view  will  be  counted  as  a  true  positive,  in  that  the  detected  objects  in  the  two 
views  are  correlated  and  analyzed  for  similarity  and  the  likelihood  score  of  a  mass  detected  in 
both  views  may  be  enhanced  compared  with  FPs.  Our  results  indicate  that  two-view  fusion 
significantly  improved  the  overall  performance  of  the  single-view  CAD  system. 


(6)  Key  Research  Accomplishments 

•  Continue  to  collect  the  data  sets  of  digitized  film  mammograms  with  multiple  examinations. 
(Task  1). 

•  Investigation  of  a  bilateral  approach  to  reduce  the  FPs  on  single  CAD  system  (Task  2). 

•  Development  of  image  processing  techniques  for  improvement  of  mass  detection  on  prior 
mammograms  (Task  2). 


•  Continue  to  develop  a  two-view  information  fusion  method  (Task  3). 


(7)  Reportable  Outcomes 

As  a  result  of  the  support  by  the  USAMRMC  BCRP  grant,  we  have  conducted  studies  to 
develop  a  computer-aided  diagnosis  system  for  early  detection  of  masses  using  retrospectively 
detected  cancers  on  prior  mammograms.  We  have  presented  the  results  of  these  investigations  in 
this  project  year  and  a  journal  article  which  was  accepted  for  publication  last  year  had  been 
published  in  this  project  year.  Also,  we  have  one  journal  paper  published  in  Academic 
Radiology  and  one  journal  paper  accepted  for  publication  in  Medical  Physics. 

Journal  Articles: 


1.  Wei  J,  Chan  HP,  Sahiner  B,  Hadjiiski  LM,  Helvie  MA,  Roubidoux  MA,  Zhou  C,  Ge  J, 
"Dual  system  approach  to  computer-aided  detection  of  breast  masses  on  mammograms", 
Medical  Physics,  Vol.  33,  No.  1 1,  pp.  4157-4168,  2006. 

2.  Wei  J,  Hadjiiski  LM,  Sahiner  B,  Chan  HP,  Ge  J,  Roubidoux  MA,  Helvie  MA,  Zhou  C, 
Wu  YT,  Paramagul  C,  Zhang  Y,  “Computer  Aided  Detection  Systems  for  Breast  Masses: 
Comparison  of  Performances  on  Full-Field  Digital  Mammograms  and  Digitized  Screen- 
film  Mammograms”,  Academic  Radiology,  Vol.  14,  No.  6,  pp.  659-669,  2007. 
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3.  Wu  YT,  Wei  J,  Hadjiiski  LM,  Sahiner  B,  Zhou  C,  Ge  J,  Shi  J,  Zhang  Y,  Chan  HP, 
“Bilateral  analysis  based  false  positive  reduction  for  computer-aided  mass  detection”, 
Medical  Physics  (in  press). 

Conference  Proceeding: 

1.  Wei  J,  Sahiner  B,  Chan  HP,  Hadjiiski  LM,  Roubidoux  MA,  Helvie  MA,  Ge  J,  Zhou  C, 
and  Wu  YT,  “Computer-aided  detection  of  breast  masses  on  prior  mammograms”,  Proc. 
SPIE,  Vol.  65 14,  pp.  51-57,  2007. 

Conference  Presentation: 


1.  J.  Wei,  B.  Sahiner,  HP.  Chan  HP,  M.  A.  Roubidoux,  M.  A.  Helvie,  YT.  Wu,  L.  M. 
Hadjiiski,  J.  Ge,  C.  Zhou,  “Computer-aided  detection  of  breast  masses  on  mammograms: 
Performance  improvement  using  two-view  information”,  Presentation  at  the  92nd 
Scientific  Assembly  and  Annual  Meeting  of  the  Radiological  Society  of  North  America, 
Chicago,  IL.  November  26-December  1,  2006. 


(8)  Conclusions 

During  this  project  year,  we  first  investigated  an  FP  reduction  method  based  on  analysis 
of  bilateral  mammograms  for  computerized  mass  detection  systems.  Our  results  demonstrate 
that  the  bilateral  features  can  be  utilized  to  differentiate  the  similarity  and  dissimilarity  between 
tissues  at  corresponding  locations  in  the  bilateral  views,  and  can  be  useful  for  improving  the 
performance  of  a  unilateral  CAD  system  by  further  reducing  the  FPs. 


In  a  second  study,  we  developed  several  image  processing  techniques  for  improvement  of 
mass  detection  on  prior  mammograms.  The  new  techniques  can  significantly  improve  the 
performance  of  mass  detection  on  prior  mammograms  without  a  trade-off  in  the  detection  of 
masses  on  current  mammograms.  It  is  expected  that  the  improved  CAD  system  can  increase  the 
overall  accuracy  for  detection  of  subtle  early-stage  breast  cancers. 


The  third  study  performed  in  this  project  year  is  to  continue  to  develop  a  two-view 
information  fusion  method  to  improve  the  perfonnance  of  our  CAD  system  for  mass  detection. 
The  two-view  CAD  system  is  different  from  case-based  scoring  in  that  the  detected  objects  in  the 
two  views  are  correlated  and  analyzed  for  similarity  and  the  likelihood  score  of  a  mass  detected 
in  both  views  may  be  enhanced  compared  with  FPs.  Our  results  indicate  that  two-view  fusion 
can  significantly  improve  the  overall  performance  of  the  single-view  CAD  system. 

From  the  results  of  these  studies,  we  found  that  our  single  CAD  system  can  be  improved 
by  the  new  image  processing  techniques  and  the  bilateral  and  two-view  analyses.  We  have 
already  shown  in  the  previous  annual  progress  reports  that  our  proposed  dual  CAD  system 
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approach  is  a  very  promising  method  to  improve  detection  of  subtle  early  breast  cancers.  In  the 
coming  project  year,  we  plan  to  investigate  the  combination  of  the  techniques  developed  in  this 
project  year  with  the  dual  CAD  system  approach.  We  expect  that  the  dual  CAD  system  will  be 
further  improved  when  combined  with  the  new  techniques. 
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In  this  study,  our  purpose  was  to  improve  the  performance  of  our  mass  detection  system  by  using 
a  new  dual  system  approach  which  combines  a  computer-added  detection  (CAD)  system  optimized 
with  “average”  masses  with  another  CAD  system  optimized  with  “subtle”  masses.  The  two  single 
CAD  systems  have  similar  image  processing  steps,  which  include  prescreening,  object  segmenta¬ 
tion,  morphological  and  texture  feature  extraction,  and  false  positive  (FP)  reduction  by  rule-based 
and  linear  discriminant  analysis  (LDA)  classifiers.  A  feed-forward  backpropagation  artificial  neural 
network  was  trained  to  merge  the  scores  from  the  LDA  classifiers  in  the  two  single  CAD  systems 
and  differentiate  true  masses  from  normal  tissue.  For  an  unknown  test  mammogram,  the  two  single 
CAD  systems  are  applied  to  the  image  in  parallel  to  detect  suspicious  objects.  A  total  of  three  data 
sets  were  used  for  training  and  testing  the  systems.  The  first  data  set  of  230  current  mammograms, 
referred  to  as  the  average  mass  set,  was  collected  from  115  patients.  We  also  collected  264  mam¬ 
mograms,  referred  to  as  the  subtle  mass  set,  which  were  one  to  two  years  prior  to  the  current  exam 
from  these  patients.  Both  the  average  and  the  subtle  mass  sets  were  partitioned  into  two  indepen¬ 
dent  data  sets  in  a  cross  validation  training  and  testing  scheme.  A  third  data  set  containing  65  cases 
with  260  normal  mammograms  was  used  to  estimate  the  FP  marker  rates  during  testing.  When  the 
single  CAD  system  trained  on  the  average  mass  set  was  applied  to  the  test  set  with  average  masses, 
the  FP  marker  rates  were  2.2,  1.8,  and  1.5  per  image  at  the  case-based  sensitivities  of  90%,  85%, 
and  80%,  respectively.  With  the  dual  CAD  system,  the  FP  marker  rates  were  reduced  to  1.2,  0.9, 
and  0.7  per  image,  respectively,  at  the  same  case-based  sensitivities.  Statistically  significant  (p 
<0.05)  improvements  on  the  free  response  receiver  operating  characteristic  curves  were  observed 
when  the  dual  system  and  the  single  system  were  compared  using  the  test  sets  with  either  average 
masses  or  subtle  masses.  ©  2006  American  Association  of  Physicists  in  Medicine. 
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Key  words:  computer-aided  detection  (CAD),  mass  detection,  mammogram,  dual  system,  artificial 
neural  network  (ANN) 


I.  INTRODUCTION 

Breast  cancer  is  one  of  the  leading  causes  of  cancer  mortality 
among  women.1  It  has  been  reported  that  early  diagnosis  and 
treatment  can  significantly  improve  the  chance  of  survival 
for  patients  with  breast  cancer."-4  At  present,  the  most  suc¬ 
cessful  method  for  the  early  detection  of  breast  cancer  is 
screening  mammography.3  Various  methods  are  being  devel¬ 
oped  to  improve  the  accuracy  of  breast  cancer  detection. 
Double  reading  by  radiologists  can  reduce  the  miss  rate  of 
radiographic  reading.  However,  double  reading  will  increase 
the  cost  of  mammographic  screening.  An  alternative  method 
is  to  use  a  trained  computer-aided  detection  (CAD)  system  as 
a  second  reader.6'7  Recent  clinical  studies  have  shown  that 
CAD  systems  are  helpful  for  increasing  radiologists’  accu¬ 
racy  in  detecting  breast  cancers.8-13 

A  large  volume  of  literature  has  been  published  in  the 
CAD  area.  CAD  systems  for  mammography  generally  con¬ 
sist  of  two  subsystems:  one  is  a  mass  detection  system  and 
the  other  is  a  microcalcification  detection  system.  Detection 
of  masses  on  mammograms  is  often  more  challenging  than 


detection  of  microcalcifications.  The  mass  detection  systems 
to-date  have  employed  a  single-system  approach  using  vari¬ 
ous  techniques  for  prescreening  of  mass  candidates  and  clas¬ 
sification  of  true  and  false  positives.14-24  Our  laboratory  in¬ 
corporated  two-view  mammographic  information  for 
improved  differentiation  of  true  masses  and  false  positives 
and  obtained  promising  preliminary  results."  However,  de¬ 
velopment  of  new  methods  to  improve  the  performance  of 
mass  detection  systems  remains  an  important  area  of  CAD 
research. 

The  CAD  systems  developed  so  far  have  mostly  used 
masses  seen  on  current  mammograms  (i.e.,  the  mammo¬ 
grams  on  which  the  masses  were  detected  by  radiologists) 
for  training.  An  important  purpose  of  a  CAD  system  is  that  it 
is  used  as  a  second  reader  to  alert  radiologists  to  subtle  can¬ 
cers  that  may  be  overlooked.  To  study  the  ability  of  a  CAD 
system  in  detecting  subtle  cancers  that  are  likely  to  be 
missed  by  radiologists,  one  way  is  to  evaluate  its  accuracy  in 
detecting  missed  cancers  on  prior  mammograms  (i.e.,  the 
mammograms  in  previous  examinations  on  which  the  mass 
or  cancer  can  be  seen  retrospectively  but  was  considered 
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negative  or  benign  at  the  time  of  the  examination).  Some 
researchers  have  investigated  the  performance  change  of 
CAD  systems  when  using  prior  mammograms  as  input.  In 
our  study  of  mass  detection  on  prior  mammograms,  we 
obtained  a  case-based  sensitivity  of  74%  (20/27)  of  the  ma¬ 
lignant  masses  with  2.2  false  positives  (FPs)  per  image,  te 
Brake  et  al.26  reported  that  their  CAD  system  has  a  case- 
based  sensitivity  of  34%  (22/65)  of  the  cancers  which  have 
the  appearance  of  masses  or  stellate  lesions  in  the  prior  ex¬ 
aminations  with  1  FP  per  image.  A  commercial  system  (R2 
ImageChecker)  also  reported  detection  of  42%  (72/172)  of 
the  cancers  in  the  prior  years  which  were  considered  worthy 
of  call-back  in  retrospect  by  expert  mammographers  with 
about  2  FP  marks/case.  Zheng  et  al.  reported  that  their 
CAD  system  trained  with  current  mammograms  could  not 
perform  optimally  in  prior  mammograms  and  vice  versa; 
whereas  the  same  system  trained  with  prior  mammograms 
can  perform  better  on  detecting  the  masses  on  prior  mammo- 
grams.  Recently,  an  assessment  study  was  conducted  to 
compare  the  performance  of  two  commercial  systems  and 
one  research  CAD  system  on  current  mammograms  and 
prior  mammograms.  The  results  showed  that  the  true  positive 
(TP)  fraction  for  CAD  systems  on  prior  mammograms  of  39 
breasts  with  malignant  masses  ranged  from  15%  to  26%  with 
0.28  to  0.41  FP  marks/image.  Although  the  detection  perfor¬ 
mance  reported  in  the  different  studies  vary,  probably  due  to 
the  differences  in  the  data  set  used,  these  studies  indicate  that 
the  sensitivities  of  current  CAD  systems  in  detecting  subtle 
masses  on  prior  mammograms  are  substantially  lower  than 
that  obtained  from  detection  on  current  mammograms.  The 
difficulty  in  recognizing  the  subtle  and  possibly  different  fea¬ 
tures  of  the  masses  on  priors  compared  to  those  of  the 
masses  on  current  mammograms  may  be  one  of  the  factors 
that  causes  oversight  for  both  radiologists  and  the  CAD  sys¬ 
tems. 

The  goal  of  pattern  recognition  is  to  achieve  the  best  pos¬ 
sible  classification  performance  in  the  task  at  hand.  Re¬ 
searchers  had  shown  that,  for  a  class  of  objects  with  a  wide 
range  of  characteristics,  the  classification  performance  can  be 
improved  by  using  combination  of  classifiers  whereby  ob¬ 
jects  of  certain  characteristics  are  classified  by  one  classifier 
using  a  set  of  features  and  objects  of  different  characteristics 
by  another  classification  scheme  based  on  different 
features.  The  advantage  of  using  combination  of  classi¬ 
fiers  is  that  it  may  stabilize  the  training  of  classifiers  even 
with  a  relatively  small  sample  size  because  each  classifier 
does  not  have  to  accommodate  a  wide  range  of  characteris¬ 
tics  and  features.36'37  These  observations  motivated  our  inter¬ 
est  in  the  design  of  a  dual  CAD  system  for  mass  detection. 

Since  the  missed  cancers  on  prior  mammograms  represent 
the  difficult  cases  that  are  more  likely  to  be  missed  by  radi¬ 
ologists  if  similar  cancers  occur  on  screening  mammograms, 
it  is  important  to  improve  the  sensitivity  of  the  CAD  system 
in  detecting  these  cancers.  On  the  other  hand,  when  a  CAD 
system  is  applied  to  a  new  mammogram  in  clinical  practice, 
it  has  to  detect  breast  lesions  of  all  degrees  of  subtlety  effec¬ 
tively.  However,  it  is  difficult  to  train  a  single  CAD  system  to 


provide  optimal  detection  for  all  lesions  over  the  entire  spec¬ 
trum  of  subtlety  because  the  classifiers  have  to  make  com¬ 
promises  to  accommodate  cancers  of  a  wide  range  of  char¬ 
acteristics.  Therefore,  we  have  been  exploring  a  new  dual 
CAD  system  approach  that  combines  a  CAD  system  trained 
with  retrospectively  seen  masses  on  prior  mammograms  with 
a  CAD  system  trained  with  masses  detected  on  current 
mammograms.  ’  In  this  paper,  we  will  describe  the  design 
of  the  dual  CAD  system  and  report  our  current  results. 

II.  MATERIALS  AND  METHOD 
A.  Data  sets 

All  mammograms  in  this  study  were  collected  from  pa¬ 
tient  files  in  the  Department  of  Radiology  at  the  University 
of  Michigan  with  Institutional  Review  Board  (IRB)  ap¬ 
proval.  The  mammograms  were  digitized  with  a  LUMISYS 
85  laser  film  scanner  with  a  pixel  size  of  50  /xm  X  50  /im 
and  4096  gray  levels.  The  scanner  was  calibrated  to  have  a 
linear  relationship  between  gray  levels  and  optical  densities 
(O.D.)  from  0.1  to  greater  than  3  O.D.  units.  The  nominal 
O.D.  range  of  the  scanner  is  0-4.  The  full  resolution  mam¬ 
mograms  were  first  smoothed  with  a  2  X  2  box  filter  and 
subsampled  by  a  factor  of  2,  resulting  in  100  /xm 
X  100  /am  images.  The  images  at  a  pixel  size  of  100  /xm 
X  100  /xm  were  used  for  the  input  of  our  CAD  system. 

We  collected  three  data  sets.  The  first  data  set  contained 
115  cases  with  confirmed  masses.  Each  case  included  the 
current  mammograms  that  prompted  the  radiologist  to  work 
up  the  mass.  This  is  referred  to  as  the  “average”  mass  set.  All 
of  the  cases  in  the  average  mass  set  had  two  mammographic 
views:  the  craniocaudal  view  and  the  mediolateral  oblique 
view  or  the  lateral  view,  thus  yielding  a  total  of  230  mam¬ 
mograms.  There  were  115  masses  (67  malignant  masses  and 
48  benign  masses)  in  this  data  set,  of  which  105  were 
biopsy-proven  and  10  were  determined  to  be  benign  by  long¬ 
term  follow-up. 

The  second  data  set  was  composed  of  the  prior  mammo¬ 
grams  dated  one  to  two  years  earlier  than  the  mammograms 
of  the  same  patients  in  the  average  mass  set.  Since  the 
masses  on  prior  mammograms  are  on  average  subtler  than 
those  on  current  mammograms,  this  data  set  is  referred  to  as 
the  “subtle”  mass  set.  On  5  of  the  115  patients,  no  mass  or 
focal  density  could  be  identified  on  either  view  of  the  prior 
mammograms.  Therefore,  the  subtle  mass  set  was  composed 
of  110  cases  (62  malignant  and  48  benign).  For  the  purpose 
of  training  the  subtle  mass  detection  system,  the  subtle 
masses  do  not  have  to  be  obtained  from  the  same  cases  as  the 
average  mass  set  but  we  used  the  available  prior  mammo¬ 
grams  for  these  mass  cases  in  our  database.  Nineteen  of  the 
110  cases  had  two  prior  mammogram  examinations.  Of  the 
129  examinations  in  the  subtle  mass  set,  123  had  two  mam¬ 
mographic  views  and  6  had  three  views,  with  a  total  of  264 
mammograms.  Many  of  the  subtle  masses  on  the  prior  mam¬ 
mograms  could  be  identified  only  as  a  focal  density  corre¬ 
sponding  to  the  location  of  the  subsequently  detected  mass 
on  the  current  mammograms.  On  44  of  the  two- view  prior 
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Table  I.  Description  of  cases  in  the  average  and  subtle  mass  data  sets  and 
the  subsets  for  training  and  testing  in  the  cross-validation  scheme. 


Mass  subset  1 

Mass  subset  2 

Average 
mass  subset 

Subtle 

mass  subset 

Average 
mass  subset 

Subtle 

mass  subset 

Total  No.  of  cases 

57 

54 

58 

56 

Cases  with  two 
prior  examinations 

NA 

10 

NA 

9 

Exams  with  two 

views 

57 

58 

58 

65 

Exams  with  three 

views 

0 

6 

0 

0 

Total  No. 
of  images 

114 

134 

116 

130 

No.  of  negative 
images 

0 

25 

0 

19 

No.  of  mass  images 
for  training 

114 

109 

116 

111 

No.  of  two-view 
pairs  for  testing 

57 

64 

58 

65 

No.  of  images  for 
testing 

114 

128 

116 

130 

No.  of  malignant 

masses 

36 

33 

31 

29 

No.  of  benign 

masses 

21 

21 

27 

27 

mammograms,  the  mass  location  was  evident  only  on  one 
view.  Table  I  summarizes  the  information  for  the  average  and 
subtle  mass  subsets. 

The  third  data  set  was  composed  of  260  normal  bilateral 
two-view  mammograms  obtained  from  65  patients.  No 
masses  were  evident  on  these  mammograms  upon  review  by 
the  experienced  radiologist. 

The  two  mass  data  sets  were  used  to  estimate  the  detec¬ 
tion  sensitivity  and  the  normal  data  set  was  used  for  estimat¬ 
ing  the  FP  marker  rate.  For  the  mass  data  sets,  the  true  loca¬ 
tions  of  the  masses  were  identified  by  an  experienced  MQSA 
radiologist  using  all  available  imaging  and  clinical  informa¬ 
tion.  The  radiologist  also  provided  an  estimate  of  the  longest 
diameter  of  the  mass,  descriptors  of  its  margin  and  shape,  a 
visibility  rating,  and  an  estimate  of  the  breast  density  in 
terms  of  BI-RADS  category.  Figure  1  shows  the  distributions 
of  mass  sizes,  mass  shapes,  mass  margins,  and  their  visibility 
on  a  10-point  rating  scale  with  1  representing  the  most  vis¬ 
ible  masses  and  10  the  most  difficult  case  relative  to  the 
cases  seen  in  their  clinical  practice.  The  masses  had  a  mean 
of  13.7  mm  and  a  median  of  12  mm  in  the  average  data  set 
and  a  mean  of  9.7  mm  and  a  median  of  10  mm  in  the  subtle 
data  set.  Figure  2  shows  the  breast  density  for  both  the  nor¬ 
mal  data  set  and  the  mass  data  sets.  As  can  be  seen  from  the 
distributions  of  the  mass  characteristics,  the  average  masses 
on  the  current  mammograms  and  the  subtle  masses  on  the 
priors  had  large  overlap.  Nevertheless,  on  average,  the  subtle 
masses  were  smaller  in  size  and  less  conspicuous  on  the 
mammograms. 


B.  Methods 

In  order  to  improve  the  sensitivity  of  detecting  breast  le¬ 
sions  of  all  degrees  of  subtlety,  we  developed  a  new  dual 
system  approach  which  combines  a  system  trained  with  av¬ 
erage  masses  with  another  system  trained  with  subtle  masses. 
When  the  trained  dual  system  is  applied  to  an  unknown 
mammogram,  the  two  CAD  systems  are  used  in  parallel  to 
detect  suspicious  objects  on  a  single  mammogram.  No  prior 
mammogram  is  needed.  The  additional  FPs  from  the  use  of 
the  two  systems  are  reduced  by  an  information  fusion  stage. 
We  will  refer  to  the  two  systems  separately  trained  with  the 
average  masses  and  the  subtle  masses  as  “single”  CAD  sys¬ 
tems  in  the  following  discussions. 

We  randomly  separated  the  mass  data  sets  by  case  into 
two  independent  subsets.  Both  the  average  and  subtle  mass 
subsets  followed  the  same  case  grouping  so  that  mammo¬ 
grams  from  the  same  case  would  not  be  separated  into  the 
training  subset  for  one  single  CAD  system  and  the  test  subset 
for  the  other  single  CAD  system  in  a  cross-validation  cycle. 
Table  I  shows  the  subsets  of  cases  in  the  average  and  subtle 
mass  data  sets.  Two-fold  cross  validation  was  used  for  train¬ 
ing  and  testing  the  algorithms.  The  training  included  select¬ 
ing  proper  parameters  for  each  single  CAD  system  and  for 
information  fusion.  Once  the  training  with  one  mass  subset 
was  completed,  the  parameters  were  fixed  for  testing  with  the 
other  mass  subset.  The  training  and  test  mass  subsets  were 
switched  and  the  training  and  test  processes  were  repeated. 
The  CAD  systems  were  trained  with  single  mammograms. 
To  maximize  the  number  of  training  images  with  masses,  all 
images  with  a  visible  mass  were  included  regardless  of 
whether  they  were  a  part  of  a  two-view  or  three-view  case 
when  the  subtle  mass  subset  was  used  as  a  training  set.  How¬ 
ever,  when  the  subtle  mass  subset  was  used  as  a  test  set,  only 
two  views  were  included  for  each  case  because  we  used  two- 
view  mammograms  to  derive  the  case-based  test  perfor¬ 
mance.  For  cases  containing  three  views,  we  therefore  in¬ 
cluded  only  two  of  the  views  in  testing.  We  also  included 
cases  with  the  mass  visible  on  only  one  of  the  two  views. 
After  the  two-fold  cross  validation  testing,  the  overall  detec¬ 
tion  performance  was  evaluated  by  combining  the  perfor¬ 
mances  of  the  two  test  subsets.  The  trained  algorithms  with 
the  fixed  parameters  were  also  applied  to  the  normal  set  of 
mammograms,  which  was  not  used  during  training,  to  esti¬ 
mate  the  FP  rate  in  screening  mammograms. 


1.  Single  CAD  system  overview 

The  major  steps  in  the  two  single  mass  detection  systems 
are  similar  but  the  feature  spaces  and  classifiers  for  FP  re¬ 
duction  in  each  system  were  designed  separately  to  suit  the 
characteristics  of  average  and  subtle  masses,  respectively. 
The  two  systems  are  therefore  described  together  in  the  fol¬ 
lowing  but  the  differences  will  be  pointed  out  whenever  ap¬ 
plicable.  Each  single  CAD  system  consists  of  four  process¬ 
ing  steps:  (1)  prescreening  of  mass  candidates,  (2) 
segmentation  of  suspicious  objects,  (3)  feature  extraction  and 
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Fig.  1.  The  characteristics  of  the  masses  in  our  mass  data  set:  (a)  distribution  of  mass  sizes,  (b)  distribution  of  mass  visibility  on  a  10-point  rating  scale  with 
1  representing  the  most  visible  masses  and  10  the  most  subtle  masses  relative  to  the  cases  seen  in  clinical  practice,  (c)  distribution  of  mass  shapes,  (d) 
distribution  of  mass  margins,  C:  circumscribed,  Ind:  indistinct,  M:  microlobulated,  Ob:  obscured,  Sp:  spiculated. 


analysis,  and  (4)  FP  reduction  by  classification  of  normal 
tissue  structures  and  masses.  The  block  diagram  for  the  de¬ 
tection  scheme  is  shown  in  Fig.  3. 

For  the  prescreening  stage,  we  have  developed  a  two- 
stage  gradient  field  analysis  method  which  not  only  uses  the 
shape  information  of  masses  on  mammograms  but  also  in¬ 
corporates  the  gray  level  information  of  the  local  object  seg- 


Breast  Density 

Fig.  2.  The  distribution  of  breast  density  in  terms  of  BI-RADS  categories 
estimated  by  an  MQSA  radiologist. 


mented  by  a  region  growing  technique  in  the  second  stage  to 
refine  the  gradient  held  analysis.-4'40  Locations  of  high  radial 
gradient  convergence  are  labeled  as  mass  candidates.  After 
prescreening,  the  suspicious  objects  are  identified  by  using  a 
two-stage  segmentation  method.41  First,  the  background- 


Fig.  3.  Schematic  diagram  of  our  single  CAD  system  for  mass  detection. 
The  FP  classification  stage  includes  rule-based  classification,  a  morphologi¬ 
cal  LDA  classifier,  and  a  texture  feature  LDA  classifier  for  differentiating 
masses  from  normal  breast  tissues. 
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corrected  ROI  is  weighted  by  a  two-dimensional  Gaussian 
function  with  cr=256  pixels  to  enhance  the  central  region. 
Sobel  filtering  is  then  applied  to  the  Gaussian-weighted  ROI 
to  generate  another  enhanced  image.  Second,  a  A:-means  clus¬ 
tering  using  the  pixel  values  from  these  two  images  as  fea¬ 
tures  is  used  to  segment  the  object.  For  each  suspicious  ob- 
ject,  eleven  morphological  features  were  extracted.  Rule- 
based  and  linear  discriminant  classifiers  were  trained  by 
using  the  training  data  set  only  to  remove  the  detected  struc¬ 
tures  that  were  substantially  different  from  breast  masses. 
For  the  system  trained  with  average  masses,  global  and  local 
multiresolution  texture  analysis4-  were  performed  in  each 
ROI  by  using  the  spatial  gray  level  dependence  (SGLD)  ma¬ 
trices.  A  total  of  364  features  were  extracted  from  global 
texture  analysis.  Local  texture  features  were  extracted  from 
the  local  region  containing  the  detected  object  and  the  pe¬ 
ripheral  regions  within  each  ROI.  A  total  of  208  features 
were  extracted  for  local  texture  analysis.  For  the  system 
trained  with  subtle  masses,  instead  of  the  SGLD  texture  fea¬ 
tures,  gray  level  features  and  run  length  statistics  analysis 
(RLS)  texture  features43  were  extracted  inside  and  outside  of 
each  mass  region  on  the  original  image  and  gradient  field 
image.  The  gray  level  features  included  the  contrast  of  the 
object  relative  to  the  surrounding  background,  the  minimum 
and  the  maximum  gray  levels,  and  the  characteristics  derived 
from  the  gray  level  histogram  in  the  regions  inside  and  out¬ 
side  of  each  object  including  skewness,  kurtosis,  energy,  and 
entropy.  Five  RLS  texture  features  were  extracted  in  both  the 
horizontal  and  vertical  directions:  short  runs  emphasis,  long 
runs  emphasis,  gray  level  nonuniformity,  run  length  nonuni¬ 
formity,  and  run  percentage.  A  total  of  66  features  were  ex¬ 
tracted  for  the  system  trained  with  subtle  masses. 

In  order  to  obtain  the  best  texture  feature  subset  and  also 
reduce  the  dimensionality  of  the  feature  space  to  design  an 
effective  classifier,  stepwise  feature  selection  with  linear  dis¬ 
criminant  analysis  (LDA)  was  applied  to  the  training  subset. 
The  detailed  procedure  has  been  described  elsewhere.-4'44'45 
Briefly,  at  each  step  one  feature  was  entered  or  removed 
from  the  feature  pool  by  analyzing  its  effect  on  the  selection 
criterion,  which  was  chosen  to  be  the  Wilks’  lambda  in  this 
study.  Since  the  appropriate  values  of  thresholds  for  feature 
entry,  feature  elimination,  and  tolerance  of  correlation  for 
feature  selection  were  unknown,  we  used  an  automated  sim¬ 
plex  optimization  method  to  search  for  the  best  combination 
of  thresholds  in  the  parameter  space.  The  simplex  algorithm 
used  a  leave-one-case-out  resampling  method  within  the 
training  subset  to  select  features  and  estimate  the  weights  for 
the  LDA  classifier.  To  have  a  figure-of-merit  to  guide  feature 
selection,  the  test  discriminant  scores  from  the  left-out  cases 
were  analyzed  using  receiver  operating  characteristic  (ROC) 
methodology.46  The  accuracy  for  classification  of  masses  and 
FPs  was  evaluated  as  the  area  under  the  ROC  curve,  Az.  In 
this  approach,  feature  selection  was  performed  without  the 
left-out  case  so  that  the  test  performance  would  be  less  op¬ 
timistically  biased.47  However,  the  selected  feature  set  in 
each  leave-one-case-out  cycle  could  be  slightly  different  be¬ 
cause  every  cycle  had  one  training  case  different  from  the 
other  cycles.  In  order  to  obtain  a  single  trained  classifier  to 


Fig.  4.  Schematic  diagram  of  proposed  dual  CAD  system  for  mass  detec¬ 
tion.  BP-ANN  is  used  for  information  fusion. 

apply  to  the  independent  test  subset,  a  final  stepwise  feature 
selection  was  performed  with  the  best  combination  of  thresh¬ 
olds,  found  in  the  simplex  optimization  procedure,  on  the 
entire  training  subset  to  obtain  the  final  set  of  features  and 
estimate  the  weights  of  the  LDA.  Note  that  the  entire  process 
of  feature  selection  and  classifier  weight  estimation  was  per¬ 
formed  within  the  training  subset.  The  LDA  classifier  with 
the  selected  feature  set  was  then  fixed  and  applied  to  the 
independent  test  subset.  The  training  and  testing  processes 
were  performed  independently  for  the  two-fold  cross- 
validation  sets. 

2.  Training  and  test  for  dual  system 

The  block  diagram  for  the  dual  system  is  shown  in  Fig.  4. 
During  the  training  of  the  dual  system,  we  used  the  current 
and  prior  mammograms  from  the  same  patients.  The  current 
mammograms  that  contained  the  average  masses  were  only 
used  to  train  the  first  single  CAD  system.  The  prior  mammo¬ 
grams  that  contained  the  subtle  masses  were  only  used  to 
train  the  second  single  CAD  system.  The  prescreening  and 
the  segmentation  steps  in  the  two  systems  are  identical. 
Since  the  morphological  appearances  of  average  and  subtle 
masses  are  different,  the  rules  in  the  morphological  rule- 
based  FP  classification  are  trained  differently  for  the  two 
single  CAD  systems.  During  testing  with  an  independent 
mammogram,  the  dual  system  keeps  all  the  suspicious  ob¬ 
jects  that  satisfy  the  FP  classification  rules  of  either  single 
CAD  system  and  applies  the  LDA  classifiers  from  both 
single  systems  to  each  object.  Each  object  thus  has  two  LDA 
scores. 

To  merge  the  information  from  the  two  CAD  systems,  a 
fusion  scheme  was  developed  for  our  dual  system.  In  this 
study,  a  feed-forward  backpropagation  artificial  neural  net¬ 
work  (BP-ANN)  was  trained  to  classify  the  masses  from  nor¬ 
mal  tissues  by  combining  the  output  information  from  the 
two  single  CAD  systems.  The  LDA  classifiers  from  the  two 
single  CAD  systems  were  applied  to  each  detected  object. 
The  two  LDA  discriminant  scores  for  each  object  were  used 
as  input  to  the  BP-ANN.  The  BP-ANN  had  an  input  layer 
with  two  nodes,  a  hidden  layer  with  N  nodes,  and  an  output 
layer  with  one  node.  The  nodes  were  interconnected  by 
weights  and  information  propagated  from  one  layer  to  the 
next  through  a  log-sigmoidal  activation  function.  The  learn¬ 
ing  of  the  ANN  was  a  supervised  process  in  which  known 
training  cases  were  input  to  the  ANN.  The  performance  func- 
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tion  for  the  network  was  the  mean-squared  error  between  the 
network  outputs  and  the  target  outputs.  The  weights  of  the 
network  were  adjusted  iteratively  by  a  feedforward  back- 
propagation  procedure  to  minimize  the  error.  Detailed  de¬ 
scription  of  the  backpropagation  neural  network  can  be 
found  in  the  literature.48'49 

To  choose  the  number  of  hidden  nodes  (N)  in  the  BP- 
ANN,  we  used  a  three-fold  cross-validation  method  within 
the  training  subset.  We  randomly  separated  the  entire  training 
subset  including  all  detected  objects  into  three  independent 
groups.  The  objects  belonging  to  the  same  case  were  sepa¬ 
rated  into  the  same  group.  For  a  given  N,  three  training 
cycles  were  performed,  in  each  of  which  two  of  the  three 
groups  were  used  to  train  the  BP-ANN  and  the  left-out  group 
was  used  to  test  its  performance.  The  Az  value  obtained  from 
the  ANN  output  scores  for  the  test  group  was  used  as  the 
performance  index  for  that  training  cycle.  The  average  of  the 
A.  values  from  the  three  test  groups  represented  the  perfor¬ 
mance  of  the  BP-ANN  with  N  hidden  nodes.  In  our  experi¬ 
ment,  a  BP-ANN  with  3  hidden  nodes  provided  the  largest 
average  A,  value  and  was  therefore  chosen.  The  weights  of 
the  chosen  BP-ANN  were  retrained  with  the  entire  training 
subset.  The  BP-ANN  with  the  trained  weights  was  used  to 
merge  the  information  from  the  two  single  CAD  systems. 

To  test  the  dual  system,  the  two  trained  single  CAD  sys¬ 
tems,  one  trained  with  the  average  mass  set  and  the  other 
with  the  subtle  mass  set,  were  applied  in  parallel  to  each 
single  “unknown”  mammogram  in  the  independent  test  sub¬ 
set.  No  prior  mammogram  was  needed  during  testing. 

3.  Evaluation  methods 

The  detected  individual  objects  were  compared  with  the 
“truth”  ROI  marked  by  the  experienced  radiologist,  as  de¬ 
scribed  earlier.  A  detected  object  was  scored  as  TP  if  the 
overlap  between  the  bounding  box  of  the  detected  object  and 
the  bounding  box  of  the  true  mass  relative  to  the  larger  of  the 
two  bounding  boxes  was  over  25%.  Otherwise,  it  would  be 
scored  as  FP.  The  25%  threshold  was  selected  as  described  in 
our  previous  study.”1 

The  FP  marker  rate  was  estimated  in  two  ways:  one  from 
detection  on  the  same  test  subsets  with  masses,  the  other 
from  detection  on  the  normal  data  set  of  negative  mammo¬ 
grams.  For  the  latter,  we  applied  the  trained  dual  CAD  sys¬ 
tem  to  the  normal  data  set.  The  number  of  FP  marks  pro¬ 
duced  by  the  CAD  system  was  determined  by  counting  the 
detected  objects  on  the  normal  cases.  The  mass  detection 
sensitivity  was  determined  by  counting  the  detected  masses 
on  the  test  mass  subset.  The  detection  performance  of  the 
CAD  system  was  assessed  by  free  response  ROC  (FROC) 
analysis.  A  FROC  curve  was  obtained  by  plotting  the  mass 
detection  sensitivity  as  a  function  of  FP  marks  per  image 
either  obtained  from  the  mass  data  subset  or  the  normal  set  at 
the  corresponding  decision  threshold. 

FROC  curves  were  presented  on  a  per-mammogram  and  a 
per-case  basis.  For  image-based  FROC  analysis,  the  mass  on 
each  mammogram  was  considered  an  independent  true  ob¬ 
ject.  For  case-based  FROC  analysis,  the  same  mass  imaged 
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Fig.  5.  An  example  of  a  scatter  plot  of  the  LDA  scores  from  the  two  single 
CAD  systems  which  are  used  as  input  to  the  BP-ANN.  The  correlation 
coefficient  between  the  scores  of  two  LDA  classifiers  is  0.46,  indicating  that 
the  two  LDA  scores  are  essentially  independent  features. 


on  the  two-view  mammograms  was  considered  to  be  one  true 
object  and  detection  of  either  or  both  masses  on  the  two 
views  was  considered  to  be  a  TP  detection. 

Since  we  used  two-fold  cross  validation  method  for  train¬ 
ing  and  testing,  we  obtained  two  test  FROC  curves,  one  for 
each  test  subset,  for  each  of  the  conditions  (e.g.,  single  CAD 
system  approach  or  dual  system  approach).  To  summarize 
the  results  for  comparison,  an  average  test  FROC  curve  was 
derived  by  averaging  the  FP  rates  at  the  same  sensitivity 
along  the  FROC  curves  of  the  two  corresponding  test  sub¬ 
sets. 

In  order  to  compare  the  performance  of  the  single  CAD 
system  and  the  dual  CAD  system,  we  applied  the  alternative 
free-response  ROC  (AFROC)  method  and  the  jackknife  free- 
response  ROC  (JAFROC)  method  developed  by  Chakraborty 
et  al.50’51  to  the  pairs  of  FROC  curves.  In  the  AFROC 
method,  the  FROC  data  are  first  transformed  by  counting  the 
number  of  false-positive  images  (FPIs)  instead  of  the  FPs  per 
image.  The  confidence  rating  of  a  FPI  is  determined  by  the 
highest  confidence  FP  decision  on  the  image  regardless  of 
how  many  lower  confidence  FP  decisions  are  made  on  the 
same  image.  The  ROCKIT  curve  fitting  software  and  statistical 
significance  tests  for  ROC  analysis  developed  by  Metz  et 
al.46  can  then  be  used  to  analyze  the  AFROC  data. 


III.  RESULTS 

Figure  5  shows  an  example  of  the  two-dimensional  fea¬ 
ture  space  that  was  used  as  the  input  to  the  BP-ANN  being 
trained  to  merge  the  information  from  the  two  single  CAD 
subsystems.  The  two  features  are  the  output  scores  of  the 
LDA  classifiers  trained  with  the  average  masses  and  with  the 
subtle  masses.  The  correlation  coefficients  of  the  two  fea¬ 
tures  are  0.46  and  0.44  for  each  of  the  training  subsets,  re¬ 
spectively.  The  low  correlation  indicated  that  the  two  single 
CAD  systems  extracted  relatively  independent  features  from 
the  object.  The  Az  values  of  the  chosen  ANN  were  0.92±0.01 
and  0.87±0.01,  respectively,  as  estimated  by  validation  in 
the  training  process.  The  ANN  classifiers  achieved  Az  values 
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Fig.  6.  The  test  ROC  curves  for  the  BP-ANN  classifiers  from  the  two  in¬ 
dependent  mass  subsets.  The  ANN  classifiers  achieved  an  Az  value  of 
0.90±0.02  for  test  subset  1  and  0.89±0.01  for  test  subset  2  in  the  classifi¬ 
cation  of  mass  and  normal  breast  tissues. 


of  0.90±0.02  and  0.89±0.01  on  the  two  independent  test 
subsets,  respectively.  Figure  6  shows  the  ROC  curves  for  the 
two  test  subsets. 

In  order  to  evaluate  the  effectiveness  of  our  dual  system 
approach,  we  compared  its  performance  on  the  test  subsets 
containing  average  masses  with  two  other  single  CAD  sys¬ 
tems:  the  CAD  system  trained  only  on  the  average  mass  set 
and  the  CAD  system  trained  on  both  the  average  and  the 
subtle  mass  sets.  When  a  single  CAD  system  was  trained 
only  with  the  average  masses,  the  number  of  selected  fea¬ 
tures  was  21  (14  global  and  7  local)  and  16  (10  global  and  6 
local)  texture  features  for  the  two  independent  training  sub¬ 
sets,  respectively.  When  the  CAD  system  was  trained  with 
both  the  average  and  the  subtle  masses,  the  number  of  se¬ 
lected  features  was  17  (11  global  and  6  local)  and  18  (7 
global  and  1 1  local)  texture  features  for  the  two  independent 
training  subsets,  respectively. 

For  the  dual  system,  the  single  system  trained  with  the 
average  masses  was  the  same  as  that  described  earlier.  For 
the  single  system  trained  with  subtle  masses,  four  (2  gray 
level  and  2  RLS  texture)  and  five  (3  gray  level  and  2  RLS 
texture)  features  were  selected  for  the  two  independent  train¬ 
ing  subsets,  respectively. 

The  average  test  FROC  curves  of  the  dual  CAD  system 
on  the  test  subsets  with  average  masses  were  compared  to 
those  of  the  single  CAD  systems  in  Fig.  7.  The  FP  rates  were 
estimated  from  the  mass  data  set.  The  dual  CAD  system 
achieved  a  case-based  sensitivity  of  80%,  85%,  and  90%  at 
0.6,  0.8,  and  1.0  FPs/image,  respectively,  compared  with  1.3, 
1.5,  and  1.8  FPs/image  on  the  single  CAD  system  trained 
with  average  masses  alone.  The  performance  of  the  single 
CAD  system  trained  with  both  the  average  masses  and  the 
subtle  masses  was  comparable  to  that  trained  with  average 
masses  alone,  with  FP  rates  of  1.4,  1.6,  and  1.8  FPs/image  at 
the  same  sensitivities,  respectively.  Figure  8  shows  the  com¬ 
parison  of  the  three  average  test  FROC  curves,  similar  to 
those  shown  in  Fig.  7,  except  that  the  FP  rates  were  esti¬ 
mated  from  the  normal  data  set.  The  FP  rates  at  a  few  se¬ 
lected  sensitivities  for  the  dual  and  single  CAD  systems  were 
summarized  in  Table  II. 


Number  of  False  Positives  per  Image 

(a) 


Number  of  False  Positives  per  Image 

(b) 

Fig.  7.  Comparison  of  the  average  test  FROC  curves  obtained  from  aver¬ 
aging  the  FROC  curves  of  the  two  independent  average-mass  subsets.  Three 
CAD  systems  were  compared:  a  single  CAD  system  trained  with  average 
masses  alone,  a  single  CAD  system  trained  with  both  the  average  and  the 
subtle  masses,  and  the  dual  CAD  system.  The  FP  rate  was  estimated  from 
the  mammograms  with  masses,  (a)  Image-based  FROC  curves,  (b)  case- 
based  FROC  curves. 


In  this  study,  we  have  67  malignant  cases  in  the  average 
mass  set.  Figure  9  compares  the  average  test  FROC  curves  of 
the  single  CAD  system  and  the  dual  system  for  detection  of 
malignant  masses.  The  result  for  the  single  CAD  system 
trained  with  average  masses  was  shown  and  the  FP  rate  was 
estimated  from  the  mammograms  without  masses.  In  this 
case,  the  dual  CAD  system  achieved  a  case-based  sensitivity 
of  80%,  85%,  and  90%  at  0.6,  0.9,  and  1.2  FP  marks/image, 
respectively,  compared  with  1.1,  1.6,  and  2.0  FP  marks/ 
image  on  the  single  CAD  system. 

An  important  purpose  of  a  CAD  system  is  to  serve  as  a 
second  reader  to  alert  radiologists  to  subtle  cancers  that  may 
be  overlooked.  Figures  10  and  11  compare  the  average 
FROC  curves  of  the  single  CAD  system  and  the  dual  system 
for  detection  in  the  test  subsets  with  subtle  masses.  The  TP 
rate  in  Fig.  10  was  estimated  by  including  both  malignant 
and  benign  masses  and  that  in  Fig.  11  was  estimated  from 
malignant  masses  only.  The  single  CAD  system  trained  with 
average  masses  alone  was  used.  The  FP  rates  for  both  sys- 
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Fig.  8.  Comparison  of  the  average  test  FROC  curves  obtained  from  aver¬ 
aging  the  FROC  curves  of  the  two  independent  average-mass  subsets.  Three 
CAD  systems  were  compared:  a  single  CAD  system  trained  with  average 
masses  only,  a  single  CAD  system  trained  with  the  average  and  the  subtle 
masses,  and  the  dual  CAD  system.  The  FP  rate  was  estimated  from  the 
mammograms  without  masses,  (a)  Image-based  FROC  curves,  (b)  case- 
based  FROC  curves. 


Fig.  9.  Comparison  of  the  average  test  FROC  curves  of  the  single  CAD 
system  and  the  dual  CAD  system  for  detection  of  malignant  masses  in  the 
average  data  set.  The  single  system  trained  with  average  masses  alone  was 
used  and  the  FP  rate  was  estimated  from  the  mammograms  without  masses, 
(a)  Image-based  FROC  curves,  (b)  case-based  FROC  curves. 


terns  were  estimated  from  the  mammograms  without  masses. 
The  dual  CAD  system  achieved  a  case-based  sensitivity  of 
50%  at  0.7  FP  marks/image  for  all  masses  and  at  0.5  FP 
marks/image  for  malignant  masses  only,  compared  with  1.4 


Table  II.  Comparison  of  case-based  detection  performance  between  the 
dual  system  and  the  single  CAD  system  trained  with  average  masses  alone. 
The  FP  marker  rates  were  estimated  from  detection  on  the  normal  data  set. 
The  FROC  curves  were  obtained  by  averaging  the  FROC  curves  of  the  two 
test  subsets. 


Average  mass  test  set 
(FP  marks/image) 

Subtle  mass  test  set 
(FP  marks/image) 

TP 

Single  system 

Dual  system 

Single  system 

Dual  system 

90% 

2.2 

1.2 

80% 

1.5 

0.7 

2.8 

70% 

1.0 

0.3 

2.4 

2.3 

60% 

0.5 

0.2 

1.8 

1.5 

50% 

0.3 

0.1 

1.4 

0.7 

FP  marks/image  for  all  masses  and  1.1  FP  marks/image  for 
malignant  masses  only  using  the  single  CAD  system. 

Table  II  summarizes  the  test  results  on  the  average  and 
subtle  mass  sets  for  the  dual  system  and  the  single  CAD 
system  trained  with  average  masses  at  different  sensitivity 
levels.  The  FP  marker  rates  were  estimated  from  the  detec¬ 
tion  on  the  normal  data  set. 

The  comparison  of  the  FROC  curves  for  the  dual  CAD 
system  and  the  single  CAD  system  in  terms  of  the  area  under 
the  fitted  AFROC  curve  (Ax)  and  the  p  values  for  both  test 
subsets  with  average  masses  was  summarized  in  Table  III. 
The  differences  between  the  A ,  values  for  the  two  systems 
were  statistically  significant  (p<0.05).  The  fitted  AFROC 
curves,  however,  did  not  fit  very  well  to  the  transformed 
AFROC  data,  as  we  discussed  previously.”4  For  the  JAFROC 
method,  Chakraborty  et  al.  provided  software  to  estimate  the 
statistical  significance  of  the  difference  between  two  FROC 
curves.  The  comparison  of  the  figure-of-merit  (FOM)  and  the 
p  values  was  also  summarized  in  Table  III.  The  differences 
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Fig.  10.  Comparison  of  the  average  test  FROC  curves  for  the  single  CAD 
system  and  the  dual  CAD  system  for  detection  of  the  subtle  masses  on  the 
prior  mammograms.  The  single  CAD  system  trained  with  average  masses 
alone  was  used  and  the  FP  rate  was  estimated  from  the  mammograms  with¬ 
out  masses,  (a)  Image-based  FROC  curves,  (b)  case-based  FROC  curves. 
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Fig.  1 1 .  Comparison  of  the  average  test  FROC  curves  for  the  single  CAD 
system  and  the  dual  CAD  system  for  detection  of  subtle  malignant  masses 
on  the  prior  mammograms.  The  single  CAD  system  trained  with  average 
masses  alone  was  used  and  the  FP  rate  was  estimated  from  the  mammo¬ 
grams  without  masses,  (a)  Image-based  FROC  curves,  (b)  case-based  FROC 
curves. 


between  the  FOM  of  the  dual  CAD  system  and  that  of  the 
single  CAD  system  for  both  test  subsets  were  again  statisti¬ 
cally  significant  (/?<0.05). 

The  comparison  of  the  FOM,  and  the  p  values  for  the 
dual  system  and  the  single  system  trained  with  average 
masses  in  detecting  subtle  masses  was  summarized  in  Table 
IV.  It  was  found  that  the  differences  between  the  results  of 
the  dual  CAD  system  and  those  of  the  single  CAD  system  on 
the  two  test  subsets  containing  subtle  masses  were  statisti¬ 
cally  significant  by  both  the  JAFROC  and  the  AFROC  meth¬ 
ods. 

IV.  DISCUSSION 

The  masses  on  prior  mammograms  are  more  subtle  and 
more  difficult  to  detect  than  the  masses  on  current  mammo¬ 
grams.  In  this  study,  we  developed  a  dual  CAD  system, 
which  combines  a  system  trained  with  masses  on  prior  mam¬ 
mograms  and  a  system  trained  with  masses  detected  on  cur¬ 
rent  mammograms.  We  have  demonstrated  that  this  dual  sys¬ 
tem  can  increase  the  accuracy  of  detecting  both  average 


masses  and  subtle  masses.  The  comparisons  of  the  dual  sys¬ 
tem  with  that  of  the  single  CAD  system  trained  with  average 
masses  alone  and  that  of  the  single  CAD  system  trained  with 
both  average  and  subtle  masses  (Fig.  7)  indicate  that  the  gain 
in  the  detection  accuracy  of  the  dual  system  could  not  be 
achieved  by  simply  using  a  larger  training  set  with  both  av¬ 
erage  and  subtle  masses.  In  fact,  it  is  interesting  to  note  that 
the  performance  of  the  single  CAD  system  trained  with  both 
the  average  and  the  subtle  masses  appeared  to  be  degraded 
slightly,  in  comparison  with  the  single  system  trained  with 
average  masses  alone,  when  it  was  applied  to  the  test  set  of 
average  masses.  The  decreased  performance  may  reflect  the 
compromise  made  when  the  single  CAD  system  was  trained 
to  accommodate  a  wide  range  of  lesion  characteristics.  Thus, 
the  dual  system  approach  may  have  improved  its  perfor¬ 
mance  through  other  factors,  including  the  flexibility  in  using 
different  feature  spaces  and  training  the  parameters  for  each 
type  of  masses  and  the  information  fusion  combining  the  two 
single  CAD  systems  effectively. 
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Table  III.  Estimation  of  the  statistical  significance  in  the  difference  between  the  FROC  performance  of  the  dual 
system  and  the  single  CAD  system  trained  with  average  masses  alone  when  the  systems  were  evaluated  on  the 
average  mass  test  subsets.  The  FROC  curves  with  the  FP  marker  rates  obtained  from  the  normal  data  set  were 
compared. 


Aj  (AFROC)  FOM  (JAFROC) 


All 

cases 

Malignant 

cases 

All 

cases 

Malignant 

cases 

Test 

subset  1 

Test 

subset  2 

Test 

subset  1 

Test 

subset  2 

Test 

subset  1 

Test 

subset  2 

Test 

subset  1 

Test 

subset  2 

Single 

0.45 

0.44 

0.47 

0.52 

0.48 

0.48 

0.53 

0.55 

system 

Dual 

0.55 

0.53 

0.58 

0.62 

0.60 

0.56 

0.63 

0.64 

system 
p  values 

0.0004 

0.0156 

0.0003 

0.0318 

<0.0001 

0.007 

0.0004 

0.0252 

For  the  comparison  of  the  different  systems,  we  analyzed 
the  false  negatives  (FNs)  of  the  single  CAD  systems  and  the 
dual  CAD  system  when  the  test  subsets  with  average  masses 
were  used.  It  was  found  that  the  FN  rates  of  the  single  CAD 
system  trained  with  average  masses,  the  single  CAD  system 
trained  with  subtle  masses,  and  the  dual  system  were  23.9% 
(55/230),  28.3%  (65/230),  and  16.5%  (38/230),  respec¬ 
tively,  after  FP  reduction  by  the  morphological  LDA  classi¬ 
fier  in  each  system.  Twenty-nine  masses  were  missed  by  both 
of  the  single  systems.  By  using  the  dual  system,  53  masses 
that  were  FNs  for  either  single  system  could  be  detected. 
However,  the  masses  that  were  missed  by  both  of  the  single 
CAD  systems  could  not  be  recovered  by  the  dual  CAD 
system. 

Our  motivation  of  this  study  is  to  improve  the  perfor¬ 
mance  of  a  CAD  system  for  mass  detection.  A  CAD  detec¬ 
tion  system  is  generally  intended  for  use  in  screening  mam¬ 
mography.  At  the  screening  stage,  all  lesions  of  concern 
should  be  pointed  out  to  radiologists  so  that  the  radiologists 
can  judge  if  a  recall  is  warranted.  If  a  detection  system  is 
trained  to  mark  only  the  malignant  lesions,  it  may  be  at¬ 
tempting  to  play  the  role  of  a  triage  system  (alerting  radiolo¬ 
gists  to  work  up  only  “malignant”  cases)  rather  than  that  of  a 
second  reader.  Furthermore,  since  computerized  lesion  detec¬ 
tion  or  characterization  on  mammograms  is  not  100%  sensi¬ 


tive,  it  will  be  confusing  to  the  radiologists  whether  an  un¬ 
marked  suspicious  lesion  is  missed  or  it  is  considered  benign 
by  the  computer.  We  believe  that  computer-aided  diagnosis 
(CADx)  may  be  used  in  different  ways  in  conjunction  with  a 
CAD  detection  system,  for  example,  the  likelihood  of  malig¬ 
nancy  may  be  estimated  by  the  CADx  system  and  displayed 
for  every  detected  lesion,  and/or  a  CADx  system  may  be 
used  during  diagnostic  workup.  Either  way  the  CAD  system 
will  first  alert  radiologists  to  all  masses,  leaving  the  assess¬ 
ment  of  malignancy  or  benignity  to  a  second  stage  and  with 
the  radiologist  being  the  primary  decision  maker.  The  train¬ 
ing  set  thus  included  both  malignant  and  benign  masses. 

For  a  CAD  system,  its  performance  for  detecting  malig¬ 
nant  masses  is  more  important  than  its  performance  for  de¬ 
tecting  all  masses.  The  FROC  curves  for  detection  of  malig¬ 
nant  masses  on  the  average  data  set  and  the  subtle  data  set, 
shown  in  Figs.  9  and  11,  respectively,  indicated  that  the  dual 
system  could  also  achieve  an  improvement  in  the  detection 
performance  over  that  of  the  single  system.  The  differences 
in  the  and  the  FOM  for  the  detection  of  malignant  cases  in 
the  average  and  subtle  mass  test  subsets  were  statistically 
significant,  as  shown  in  Tables  III  and  IV,  respectively. 

In  screening  mammography,  the  cancer  rate  is  3-5  per 
1000.  Most  of  the  mammograms  are  normal.  Therefore, 
some  CAD  researchers  and  users  estimate  the  FP  rate  using 


Table  IV.  Estimation  of  the  statistical  significance  in  the  difference  between  the  FROC  performance  of  the  dual 
system  and  the  single  CAD  system  trained  with  average  masses  alone  when  the  systems  were  evaluated  on  the 
subtle  mass  test  subsets.  The  FROC  curves  with  the  FP  marker  rates  obtained  from  the  normal  data  set  were 
compared. 


Al  (AFROC) 

FOM  (JAFROC) 

All 

cases 

Malignant 

cases 

All 

cases 

Malignant 

cases 

Test 

subset  1 

Test 

subset  2 

Test 

subset  1 

Test 

subset  2 

Test 

subset  1 

Test 

subset  2 

Test 

subset  1 

Test 

subset  2 

Single 

0.17 

0.20 

0.24 

0.25 

0.21 

0.23 

0.24 

0.26 

system 

Dual 

0.28 

0.25 

0.35 

0.34 

0.30 

0.28 

0.36 

0.34 

system 
p  values 

<0.0001 

0.046 

<0.0001 

0.0067 

0.0007 

0.048 

<0.0001 

0.0035 
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normal  mammograms5”  54  because  it  reflects  how  the  CAD 
system  performs  in  terms  of  specificity  and  whether  the  CAD 
system  may  cause  extra  efforts  for  radiologists  to  double 
check  the  marked  locations  or  unnecessary  recalls  in  a 
screening  setting.  Furthermore,  for  CAD  systems  that  set  a 
maximum  number  of  detected  objects  at  the  output,  estimat¬ 
ing  the  number  of  FPs  using  images  with  lesions  can  poten¬ 
tially  lead  to  an  optimistic  bias  for  the  FROC  curve  because 
one  of  the  detected  objects  will  likely  be  the  true  lesion.  The 
FP  rate  can  thus  be  underestimated  by  as  much  as  1  per 
image.  In  addition,  the  JAFROC  analysis  requires  that  the  FP 
rates  be  estimated  on  normal  images.  We  therefore  reported 
the  FP  rates  of  our  CAD  systems  on  both  mammograms  with 
masses  and  without  masses  to  facilitate  comparison  with 
other  CAD  systems  in  case  investigators  may  evaluate  their 
FP  rates  in  either  way. 

In  this  study,  we  evaluated  the  performance  of  the  trained 
CAD  systems  with  an  independent  test  set  using  the  two-fold 
cross  validation  method.  Although  the  selection  of  param¬ 
eters  and  features  was  performed  using  the  training  set,  we 
had  full  knowledge  of  the  performance  for  the  test  set  so  that 
the  selections  could  be  optimistically  biased.  True  indepen¬ 
dent  testing  will  have  to  be  performed  with  unknown  cases 
that  have  never  been  used  for  testing  the  CAD  system  before, 
such  as  those  in  a  prospective  clinical  trial.  However,  this 
test  step  is  beyond  the  scope  of  our  current  developmental 
process.  Since  we  used  the  same  cross-validation  method  for 
evaluation  of  the  dual  system  and  the  single  CAD  systems, 
the  comparison  of  their  relative  performances  is  expected  to 
be  less  biased  than  their  individual  performances. 


V.  CONCLUSION 

We  have  proposed  a  new  dual  system  approach  which 
combines  a  system  trained  with  subtle  masses  on  prior  mam¬ 
mograms  and  a  system  trained  with  average  masses  on  cur¬ 
rent  mammograms.  The  dual  system  achieved  higher  sensi¬ 
tivities  at  the  corresponding  FP  rates  than  a  single  CAD 
system  trained  with  average  masses  alone  or  trained  with 
both  average  masses  and  subtle  masses.  Alternatively,  the 
dual  system  had  lower  FP  rates  than  the  single  CAD  system 
at  corresponding  sensitivities.  The  improvement  in  the 
FROC  curves  by  the  dual  system  approach  was  found  to  be 
statistically  significant  (/?  <  0.05)  for  both  average  masses 
and  subtle  masses  using  either  the  AFROC  or  the  JAFROC 
method.  Our  results  indicate  that  the  dual  system  approach  is 
promising  for  improving  the  performance  of  CAD  systems 
for  mass  detection  on  mammograms. 
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Rationale  and  Objectives.  To  compare  the  performance  of  computer  aided  detection  (CAD)  systems  on  pairs  of  full-field 
digital  mammogram  (FFDM)  and  screen-film  mammogram  (SFM)  obtained  from  the  same  patients. 

Materials  and  Methods.  Our  CAD  systems  on  both  modalities  have  similar  architectures  that  consist  of  five  steps. 
For  FFDMs,  the  input  raw  image  is  first  log-transformed  and  enhanced  by  a  multiresolution  preprocessing  scheme. 
For  digitized  SFMs,  the  input  image  is  smoothed  and  subsampled  to  a  pixel  size  of  100  p, m  X  100  p,m.  For  both 
CAD  systems,  the  mammogram  after  preprocessing  undergoes  a  gradient  field  analysis  followed  by  clustering-based 
region  growing  to  identify  suspicious  breast  structures.  Each  of  these  structures  is  refined  in  a  local  segmentation 
process.  Morphologic  and  texture  features  are  then  extracted  from  each  detected  structure,  and  trained  rule-based  and 
linear  discriminant  analysis  classifiers  are  used  to  differentiate  masses  from  normal  tissues.  Two  datasets,  one  with 
masses  and  the  other  without  masses,  were  collected.  The  mass  dataset  contained  131  cases  with  131  biopsy  proven 
masses,  of  which  27  were  malignant  and  104  benign.  The  true  locations  of  the  masses  were  identified  by  an  experi¬ 
enced  Mammography  Quality  Standards  Act  (MQSA)  radiologist.  The  no-mass  data  set  contained  98  cases.  The  time 
interval  between  the  FFDM  and  the  corresponding  SFM  was  0  to  118  days. 

Results.  Our  CAD  system  achieved  case-based  sensitivities  of  70%,  80%,  and  90%  at  0.9,  1.5,  and  2.6  false  positive  (FP) 
marks/image,  respectively,  on  FFDMs,  and  the  same  sensitivities  at  1.0,  1.4,  and  2.6  FP  marks/image,  respectively,  on 
SFMs. 

Conclusions.  The  difference  in  the  performances  of  our  FFDM  and  SFM  CAD  systems  did  not  achieve  statistical  signifi¬ 
cance. 

Key  Words.  Computer-aided  detection;  mass  detection;  full-field  digital  mammogram  (FFDM);  screen-film  mammogram 
(SFM);  free-response  receiver  operating  characteristic  (FROC). 
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Full-field  digital  mammography  (FFDM)  and  screen- 
film  mammography  (SFM)  are  two  available  methods 
for  breast  cancer  screening  in  clinical  practice.  FFDM 
detectors  provide  higher  detective  quantum  efficiency 
(DQE)  and  signal-to-noise  ratio  (SNR),  wider  dynamic 
range,  and  higher  contrast  sensitivity  than  SFM.  FFDM 
may  alleviate  some  of  the  limitations  of  SFM,  espe¬ 
cially  in  breasts  with  dense  fibroglandular  tissue  (1).  In 
the  last  few  years,  several  FFDM  systems  became  com¬ 
mercially  available  because  of  the  potential  of  digital 
imaging  to  improve  breast  cancer  detection. 

Several  clinical  trials  have  been  conducted  to  compare 
radiologists’  interpretation  on  FFDMs  and  SFMs.  Lewin 
et  al  (2,3)  conducted  a  clinical  study  to  compare  FFDMs 
and  SFMs  for  the  detection  of  breast  cancer  in  6,737  ex¬ 
aminations  of  women  40  years  of  age  and  older  collected 
from  two  institutions.  Forty-two  cancers  were  detected 
within  this  population.  The  difference  in  cancer  detection 
was  not  statistically  significant  (P  >  .1)  between  FFDMs 
and  SFMs.  FFDMs  resulted  in  fewer  recalls  than  did 
SFM,  which  was  statistically  significant  (P  <  .001).  An¬ 
other  clinical  trial  (4)  aiming  at  collecting  data  for  US 
Food  and  Drug  Administration  approval  included  SFMs 
and  FFDMs  of  676  women  who  were  scheduled  to  un¬ 
dergo  breast  biopsy.  The  average  area  under  the  receiver 
operating  characteristic  (ROC)  curve,  the  sensitivity  and 
the  specificity  were  0.715,  0.66  and  0.67  for  printed 
FFDM  and  0.765,  0.74,  0.60  for  SFM,  respectively.  How¬ 
ever,  none  of  these  differences  achieved  statistical  signifi¬ 
cance.  Skaane  et  al  (5-7)  has  conducted  several  clinical 
studies  to  compare  SFM  and  FFDM  with  soft-copy  inter¬ 
pretation  for  reader  performance  in  detection  and  classifi¬ 
cation  of  breast  lesions.  According  to  their  findings,  there 
was  no  significant  difference  between  FFDM  and  SFM 
either  in  detection  or  in  classification.  A  recent  study  by 
Pisano  et  al  (1)  collected  a  total  of  49,528  patients  at  33 
sites  in  the  United  States  and  Canada.  Mammograms 
were  interpreted  independently  by  two  radiologists.  The 
overall  diagnostic  accuracy  of  FFDMs  and  SFMs  for 
breast  cancers  was  similar.  However,  FFDM  was  more 
accurate  in  women  younger  than  age  50  years,  women 
with  radiographically  dense  breasts,  and  premenopausal  or 
perimenopausal  women. 

Studies  indicate  that  radiologists  do  not  detect  all  car¬ 
cinomas  that  are  visible  on  retrospective  analyses  of  the 
images  (8-14).  Computer-aided  diagnosis  (CAD)  is  con¬ 
sidered  to  be  one  of  the  promising  approaches  that  may 
improve  the  sensitivity  of  mammography  (15,16).  Most  of 
the  mammographic  CAD  systems  developed  so  far  are 


based  on  digitized  SFMs.  Li  et  al  (17)  attempted  to  adapt 
their  CAD  system  developed  on  SFMs  for  detection  of 
masses  on  FFDMs  by  standardizing  the  FFDMs.  Their 
preliminary  results  on  a  small  data  set  (training  on  36 
normal  and  24  mass  cases,  testing  on  24  normal  and  10 
mass  cases)  showed  60%  sensitivity  at  2.47  false  posi¬ 
tives  (FPs)/image.  Several  commercial  CAD  systems  re¬ 
ported  comparable  performance  on  FFDMs  and  SFMs. 
However,  their  study  was  not  reported  in  peer-reviewed 
journals,  so  that  the  dataset  and  algorithm  are  unknown. 
So  far,  there  are  no  studies  on  comparison  of  breast  mass 
detection  between  FFDMs  and  SFMs  from  the  same  pa¬ 
tients  by  using  CAD  system.  We  have  developed  a  CAD 
system  for  mass  detection  on  SFMs  (18,19)  and  are 
adapting  the  system  to  FFDMs.  Our  preliminary  study 
with  65  patients  was  reported  previously  (20).  In  this 
study,  we  compared  the  performance  of  the  two  CAD 
systems  on  case-matched  pairs  of  FFDMs  and  SFMs. 


MATERIALS  AND  METHODS 


Materials 

Our  study  group  consisted  of  patients  with  breast  le¬ 
sions  that  were  categorized  suspicious  and  recommended 
for  biopsy.  The  patients  had  either  FFDM  or  SFM  for 
their  clinical  exams.  Institutional  review  board  approval 
and  patient  informed  consent  were  obtained  to  acquire 
corresponding  mammograms  of  the  breast  to  be  biopsied 
using  the  other  modality.  Therefore,  the  corresponding 
FFDM  and  SFM  were  available  only  from  one  breast  for 
each  patient.  The  time  interval  between  the  SFM  and  the 
FFDM  ranged  from  0  to  118  days.  The  dataset  consisted 
of  229  patients  aged  30-86  with  a  mean  age  of  55  ±  11 
years.  All  cases  have  two  mammographic  views,  the 
craniocaudal  view  and  the  mediolateral  oblique  view  or 
the  lateral  view,  yielding  a  total  of  458  FFDMs  and  458 
corresponding  SFMs.  The  SFMs  were  acquired  with 
MinR2000  screen-film  systems  (Eastman  Kodak, 
Rochester,  NY)  and  digitized  with  a  LUMISCAN  85  laser 
film  scanner  (Lumisys,  Los  Altos,  CA)  at  a  pixel  resolu¬ 
tion  of  50  /xi n  X  50  /iin  and  4096  gray  levels.  The  digi¬ 
tizer  was  calibrated  so  that  gray-level  values  were  linearly 
proportional  to  the  optical  density  in  the  range  of  0-4, 
with  a  slope  of  0.001  per  pixel  value.  The  digitizer  output 
was  linearly  converted  so  that  a  large  pixel  value  corre¬ 
sponded  to  a  low  optical  density.  FFDMs  were  acquired 
with  a  GE  Senographe  2000D  system  (GE  Medical  Sys¬ 
tems,  Milwaukee,  WI).  The  GE  system  has  a  Csl 
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Table  1 

Description  of  Cases  in  the  Mass  Datasets  and  Subsets  for  Training  and  Testing  in  the  Twofold  Cross-Validation  Scheme 


Mass  Set 

Mass  Subset  1 

Mass  Subset  2 

FFDM  SFM 

FFDM  SFM 

FFDM  SFM 

Total  number  of  cases 

131 

131 

65 

65 

66 

66 

Total  number  of  images 

262 

262 

130 

130 

132 

132 

Number  of  visible  masses  (by  case) 

131 

130 

65 

65 

66 

65 

Number  of  masses  only  visible  on  one  view 

8 

9 

5 

5 

3 

4 

Number  of  visible  masses  (by  image) 

254 

251 

125 

125 

129 

126 

Number  of  visible  malignant  masses 

27 

27 

12 

12 

15 

15 

Number  of  visible  benign  masses 

104 

103 

53 

53 

51 

50 

FFDM:  full-field  digital  mammogram;  SFM:  screen-film  mammogram. 


phosphor/a:  Si  active  matrix  flat  panel  digital  detector  with 
a  pixel  size  of  100  /rm  X  100  pm  and  14  bits  per  pixel. 
The  raw  FFDMs  were  used  as  the  input  of  our  CAD  sys¬ 
tem. 

The  dataset  included  131  cases  containing  masses 
and  98  cases  containing  microcalcifications  without  a 
visible  mass,  as  determined  with  visual  inspection  by 
an  experienced  radiologist.  The  131  cases  will  be  re¬ 
ferred  to  as  the  mass  dataset  and  the  98  cases  as  the 
“no-mass”  data  set  in  the  following  discussion.  The 
no-mass  cases  were  considered  as  “normal”  with  re¬ 
spect  to  masses  and  were  used  to  estimate  the  FP  mark 
rates  of  the  CAD  systems  during  testing.  The  mass 
dataset  contained  131  biopsy  proved  masses,  of  which 
27  were  malignant  and  104  benign.  By  examining  all 
available  information,  including  the  diagnostic  mam¬ 
mograms  and  reports,  the  true  locations  of  the  masses 
were  identified  by  an  experienced  Mammography  Qual¬ 
ity  Standards  Act  (MQSA)  radiologist.  In  these  131 
mass  cases,  1  mass  can  be  seen  only  on  FFDMs,  7 
masses  can  be  seen  on  only  one  view  on  both  FFDMs 
and  SFMs,  and  3  masses  can  be  seen  on  only  one  view 
on  either  FFDMs  (1  mass)  or  SFMs  (2  masses).  There 
were  therefore  131  visible  masses  on  FFDMs  and  130 
visible  masses  on  SFMs  if  the  masses  were  counted  by 
case.  There  were  254  visible  and  8  invisible  masses  on 
FFDMs  and  251  visible  and  11  invisible  masses  on 
SFMs  if  the  masses  were  counted  independently  by 
mammographic  view.  The  number  of  images  and 
masses  in  the  mass  dataset  are  described  in  Table  1. 
Figure  1  shows  an  example  with  a  7-mm  malignant 
mass.  The  size  of  a  mass  was  estimated  as  its  longest 
diameter  seen  on  the  mammograms.  The  visibility  of  the 
masses  was  rated  by  the  experienced  radiologist  on  a  10- 
point  scale,  with  1  representing  the  most  visible  masses  and 


a.  b. 


c.  d. 

Figure  1.  An  example  of  mammograms  with  a  region  of  interest 
(ROI)  containing  a  malignant  mass  with  a  size  of  7  mm.  (a)  Pro¬ 
cessed  full-field  digital  mammogram  (FFDM)  by  using  the  Lapla- 
cian  pyramid  multiscale  method,  (b)  digitized  screen-film  mam¬ 
mogram  (SFM),  (c)  magnified  ROI  on  FFDM,  and  (d)  magnified 
ROI  on  SFM.  The  SFM  is  displayed  with  the  same  resolution  as 
that  of  the  FFDM.  The  apparently  smaller  breast  size  on  SFM  is 
mainly  caused  by  the  very  dark  breast  periphery  region  on  the 
SFM  that  cannot  be  seen  on  the  printed  page. 

10  the  most  difficult  case  relative  to  the  cases  seen  in  clini¬ 
cal  practice.  Figures  2  and  3  show  the  histograms  of  mass 
sizes  and  visibility,  respectively,  for  the  mass  set.  The  mass 
size  ranged  from  3  to  30  mm  (mean:  12.5  ±  4.9  mm  on 
FFDMs  and  12.6  ±  4.9  mm  on  SFMs)  and  the  visibility 
ratings  extended  over  the  entire  range.  Figure  4  shows  the 
breast  density  in  terms  of  Bl-RADS  category  as  estimated 
by  the  radiologist  for  the  FFDM  and  SFM  datasets. 
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Figure  2.  Histogram  of  the  sizes  for  254  masses  on  full-field 
digital  mammograms  (FFDMs)  and  251  masses  on  the  screen-film 
mammograms  (SFMs)  in  our  dataset.  Mass  sizes  are  measured  as 
the  longest  dimension  of  the  mass  by  an  experienced  Mammog¬ 
raphy  Quality  Standards  Act  (MQSA)  radiologist.  The  size  of  the 
masses  in  the  dataset  ranged  from  3  to  30  mm  (mean:  12.5  ±  4.9 
mm  on  FFDMs  and  12.6  ±  4.9  mm  on  SFMs). 


12  3  4 

Breast  Density 


Figure  4.  Distribution  of  the  breast  density  for  the  229  cases  in 
terms  of  BI-RADS  category  estimated  by  an  MQSA  radiologist. 
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Figure  3.  Histogram  of  the  visibility  of  the  254  masses  seen  on 
full-field  digital  mammograms  and  251  masses  seen  on  screen- 
film  mammograms  in  our  dataset.  The  visibility  is  evaluated  on  a 
10-point  rating  scale,  with  1  representing  the  most  visible  masses 
and  10  the  most  difficult  case  relative  the  cases  seen  in  their  clin¬ 
ical  practice.  Each  mass  on  a  mammogram  is  rated  indepen¬ 
dently  by  an  experienced  MQSA  radiologist. 


METHODS 


CAD  System 

The  major  steps  in  the  mass  detection  systems  on 
FFDMs  and  SFMs  are  similar,  but  the  feature  spaces  and 
classifiers  for  FP  reduction  in  each  system  were  designed 
separately  to  suit  the  characteristics  of  FFDMs  and  SFMs. 
The  two  systems  are  therefore  described  together,  but  the 
differences  will  be  pointed  out  whenever  applicable.  Each 
single  CAD  system  consists  of  five  processing  steps: 


1)  preprocessing,  2)  prescreening  of  mass  candidates,  3) 
segmentation  of  suspicious  objects,  4)  feature  extraction 
and  analysis,  and  5)  FP  reduction  by  classification  of  nor¬ 
mal  tissue  structures  and  masses. 

FFDMs  are  generally  preprocessed  with  proprietary 
methods  by  the  manufacturer  of  the  FFDM  system  before 
being  displayed  to  readers.  The  image  preprocessing 
method  used  depends  on  the  manufacturer  of  the  FFDM 
system.  To  develop  a  CAD  system  that  is  less  dependent 
on  the  FFDM  manufacturer’s  proprietary  preprocessing 
methods,  we  use  the  raw  FFDM  as  input  to  our  CAD 
system.  We  have  previously  developed  a  multiscale  pre¬ 
processing  scheme  for  image  enhancement  (21).  In  brief, 
the  raw  mammogram  is  first  segmented  automatically  into 
the  background  and  the  breast  region.  A  logarithmic 
transform  is  applied  to  the  image  which  is  then  scaled  to 
12-bit.  The  Laplacian  pyramid  method  (21,22)  is  used  to 
decompose  the  transformed  breast  image  into  multiscales. 
A  nonlinear  weight  function  based  on  the  pixel  gray  level 
from  each  of  the  low-pass  components  is  designed  to  en¬ 
hance  the  high-pass  components.  The  processed  image  is 
reconstructed  by  summing  the  weighted  components. 

For  SFMs,  the  full  resolution  digitized  mammograms 
are  smoothed  with  a  2  X  2  box  filter  and  subsampled  by 
a  factor  of  2,  resulting  in  images  having  a  pixel  size  of 
100  pm  X  100  pm.  These  images  are  used  as  input  to 
the  CAD  system. 

After  preprocessing,  a  two-stage  gradient  field  analysis 
method  (21,23)  is  used  to  identify  the  mass  candidates  for 
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either  FFDMs  or  SFMs.  In  brief,  a  gradient  field  analysis 
is  employed  in  the  first  stage  to  identify  potential  mass 
candidates  based  on  high  values  of  the  initial  gradient 
field.  Each  potential  mass  candidate  is  segmented  by  a 
region  growing  technique.  The  shape  and  the  gray-level 
information  of  the  segmented  object  allow  adaptive  re¬ 
finement  of  the  gradient  field  analysis  in  the  second  stage. 
Locations  of  high  radial  gradient  convergence  are  then 
labeled  as  mass  candidates.  These  suspicious  objects  are 
segmented  with  a  k-means  clustering  method  (24).  First,  a 
256  X  256  pixel  region  of  interest  (ROI)  centered  at  the 
high  gradient  point  is  background-corrected  (25)  and 
weighted  by  a  Gaussian  function  with  cr  =  256  pixels. 
K-means  clustering  using  the  pixel  values  in  a  back- 
ground-corrected  image  and  a  Sobel  filtered  image  as  fea¬ 
tures  is  then  used  to  segment  the  object. 

For  each  suspicious  object,  eleven  morphological  fea¬ 
tures  (18)  are  extracted.  A  rule-based  classifier  removes 
the  detected  structures  that  are  substantially  different  from 
breast  masses.  Global  and  local  multiresolution  texture 
analyses  (26)  are  performed  in  each  ROI  by  using  the 
spatial  gray-level  dependence  (SGLD)  matrices.  Thirteen 
SGLD  texture  measures  are  used.  Global  texture  features 
are  extracted  from  the  entire  ROI  for  two  scales,  seven 
distances,  and  two  angles.  Local  texture  features  are  ex¬ 
tracted  from  the  local  region  containing  the  detected  ob¬ 
ject  and  the  peripheral  regions  within  each  ROI  for  two 
scales,  four  distances,  and  two  angles.  Therefore,  a  total 
of  364  features  and  208  features,  respectively,  are  ex¬ 
tracted  from  global  and  local  texture  analysis.  The  feature 
space  for  final  classification  is  the  combination  of  mor¬ 
phologic  features  and  SGLD  texture  features.  Linally,  lin¬ 
ear  discriminant  analysis  (LDA)  is  used  to  classify  masses 
from  normal  tissue  in  the  feature  space.  The  discriminant 
scores  are  ranked  for  each  mammogram,  and  any  object 
with  a  discriminant  score  that  ranks  lower  than  three  is 
eliminated. 

Training  and  Test  CAD  System 

Twofold  cross-validation  was  used  for  training  and 
testing  our  CAD  system  for  LFDMs.  We  randomly  sepa¬ 
rated  the  mass  datasets  by  case  into  two  independent  sub¬ 
sets:  subset  1  with  65  cases  and  subset  2  with  66  cases. 
The  numbers  of  masses  by  image  and  by  case  for  the 
FFDM  and  SLM  subsets  are  shown  in  Table  1.  The  train¬ 
ing  included  selection  of  proper  parameters  and  features 
for  the  classifier  in  the  CAD  system.  After  the  training 
with  one  mass  subset  was  completed,  the  parameters  and 
features  were  fixed  for  testing  with  the  other  mass  subset. 


The  training  and  test  mass  subsets  were  switched  and  the 
training  and  test  processes  were  repeated.  The  trained 
CAD  systems  were  also  applied  to  the  no-mass  data  set, 
which  was  not  used  during  training,  to  estimate  the  FP 
rate  in  screening  mammograms. 

During  training,  feature  selection  with  stepwise  LDA 
was  applied  to  obtain  the  best  feature  subset  and  reduce 
the  dimensionality  of  the  feature  space  to  design  an  effec¬ 
tive  classifier.  The  detailed  procedure  has  been  described 
elsewhere  (21,27,28).  Briefly,  at  each  step  one  feature 
was  entered  or  removed  from  the  feature  pool  by  analyz¬ 
ing  its  effect  on  the  selection  criterion,  which  was  chosen 
to  be  the  Wilks’  lambda  in  this  study.  Because  the  appro¬ 
priate  threshold  values  for  feature  entry,  feature  elimina¬ 
tion,  and  tolerance  of  feature  correlation  were  unknown, 
we  used  an  automated  simplex  optimization  method  to 
search  for  the  best  combination  of  thresholds  in  the  pa¬ 
rameter  space.  The  simplex  algorithm  used  a  leave-one- 
case-out  resampling  method  within  the  training  subset  to 
select  features  and  estimate  the  weights  for  the  LDA  clas¬ 
sifier.  To  have  a  figure  of  merit  to  guide  feature  selection, 
the  test  discriminant  scores  from  the  left-out  cases  were 
analyzed  using  ROC  methodology  (29).  The  accuracy  for 
classification  of  masses  and  LPs  was  evaluated  as  the  area 
under  the  ROC  curve,  Az,  for  the  test  cases.  In  this  ap¬ 
proach,  feature  selection  was  performed  without  the  left- 
out  case  so  that  the  test  performance  would  be  less  opti¬ 
mistically  biased  (30).  However,  the  selected  feature  set 
in  each  leave-one-case-out  cycle  could  be  slightly  differ¬ 
ent  because  every  cycle  had  one  training  case  different 
from  the  other  cycles.  To  obtain  a  single  trained  classifier 
to  apply  to  the  cross-validation  test  subset,  a  final  step¬ 
wise  feature  selection  was  performed  with  the  best  combi¬ 
nation  of  thresholds,  found  in  the  simplex  optimization 
procedure,  on  the  entire  training  subset  to  obtain  the  final 
set  of  features  and  estimate  the  weights  of  the  LDA.  Note 
that  the  entire  process  of  feature  selection  and  classifier 
weight  estimation  was  performed  within  the  training  sub¬ 
set.  The  LDA  classifier  with  the  selected  feature  set  was 
then  fixed  and  applied  to  the  cross-validation  test  subset. 
The  training  and  testing  processes  were  performed  inde¬ 
pendently  for  the  twofold  cross-validation  sets. 

Because  we  already  trained  our  CAD  system  for  SLMs 
with  a  large  dataset  in  a  previous  study  (19),  we  used  the 
trained  system  without  retraining  the  parameters  in  this 
study.  Lor  testing,  we  divided  the  SLMs  into  two  test 
datasets  that  followed  the  same  case  grouping  as  that  for 
FFDMs.  The  test  cases  in  each  subset  did  not  overlap 
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with  any  training  cases  used  for  training  the  SFM  CAD 
system  in  the  previous  study. 

Evaluation  Methods 

We  used  a  free-response  ROC  (FROC)  method  (31)  to 
assess  the  overall  performance  of  the  CAD  scheme  on 
this  image  set.  A  FROC  curve  is  obtained  by  plotting  the 
mass  detection  sensitivity  as  a  function  of  FP  marks  per 
image  as  the  decision  threshold  on  the  LDA  classifier 
scores  varies. 

The  detected  individual  objects  were  compared  with 
the  “true”  mass  locations  marked  by  the  experienced  radi¬ 
ologist,  as  described  previously.  A  detected  object  was 
labeled  as  true  positive  (TP)  if  the  overlap  between  the 
bounding  box  of  the  detected  object  and  the  bounding 
box  of  the  true  mass  relative  to  the  larger  of  the  two 
bounding  boxes  was  over  25%.  Otherwise,  it  would  be 
labeled  as  FP.  The  25%  threshold  was  selected  as  de¬ 
scribed  in  our  previous  study  (18). 

FROC  curves  were  presented  on  a  per-image  and  a 
per-case  basis.  For  image-based  FROC  analysis,  the  mass 
on  each  mammogram  was  considered  an  independent  true 
object;  the  sensitivity  was  thus  calculated  relative  to  the 
number  of  masses  by  image  on  each  subset  of  FFDMs  or 
SFMs  (Table  1).  For  case-based  FROC  analysis,  the  same 
mass  imaged  on  the  two-view  mammograms  was  consid¬ 
ered  to  be  one  true  object  and  detection  of  either  or  both 
masses  on  the  two  views  was  considered  to  be  a  TP  de¬ 
tection;  the  sensitivity  was  thus  calculated  relative  to  the 
number  of  masses  by  case  on  each  subset  of  FFDMs  or 
SFMs  (Table  1).  The  test  FROC  curve  for  a  given  mass 
subset  was  estimated  by  counting  the  detected  masses  on 
the  test  mass  subset  for  the  sensitivity.  The  FP  marker 
rate  was  estimated  in  two  ways:  one  from  FPs  detected  in 
the  same  test  mass  subsets,  the  other  from  FPs  detected  in 
the  no-mass  dataset.  For  the  latter,  we  applied  the  trained 
CAD  system  to  the  entire  no-mass  dataset.  The  average 
number  of  FP  marks  per  image  produced  by  the  CAD 
system  at  a  given  sensitivity  was  estimated  by  counting 
the  detected  objects  in  these  cases  at  the  corresponding 
decision  threshold.  Because  we  used  twofold  cross-valida¬ 
tion  method  for  training  and  testing,  we  obtained  two  test 
FROC  curves,  one  for  each  test  subset,  for  each  of  the 
modalities.  To  summarize  the  results  for  comparison,  an 
average  test  FROC  curve  was  derived  by  averaging  the 
FP  rates  at  the  same  sensitivity  along  the  FROC  curves  of 
the  two  corresponding  test  subsets. 

To  compare  the  performance  of  our  CAD  system  for 
FFDMs  and  SFMs  statistically,  we  applied  the  alternative 


Number  of  False  Positives  per  Image 

Figure  5.  Comparison  of  free-response  receiver  operating  char¬ 
acteristic  (FROC)  curves  on  full-field  digital  mammograms  and 
screen-film  mammograms  during  the  prescreening  stage.  The 
FROC  curves  were  generated  by  varying  the  number  of  detected 
suspicious  objects  per  image  based  on  the  ranking  of  the  local 
maxima  on  gradient  field  images.  The  FP  rate  was  estimated  from 
the  mammograms  with  masses. 


free-response  ROC  (AFROC)  method  and  the  jackknife 
free-response  ROC  (JAFROC)  method  developed  by 
Chakraborty  et  al  (32,33)  to  the  pairs  of  FROC  curves.  In 
the  AFROC  method,  the  FROC  data  are  first  transformed 
by  counting  the  number  of  false-positive  images  instead 
of  the  FPs  per  image.  The  LDA  score  of  a  false-positive 
image  is  determined  by  the  highest  score  FP  object  on  the 
image  regardless  of  how  many  lower  scores  FP  objects 
are  made  on  the  same  image.  The  ROCKIT  curve  fitting 
software  and  statistical  significance  tests  for  ROC  analysis 
developed  by  Metz  et  al  (29)  can  then  be  used  to  analyze 
the  AFROC  data. 


RESULTS 


For  simplicity,  we  combined  the  detection  results  on 
the  two  test  subsets  from  the  twofold  cross-validation  pro¬ 
cess  in  the  following  discussion.  The  prescreening  stage 
detected  91.3%  (232/254)  of  the  masses  with  an  average 
of  10.13  (2,655/262)  FPs  /image  on  FFDMs  and  93.2% 
(234/251)  with  an  average  of  14.43  (3,781/262)  FPs/im- 
age  on  SFMs.  Figure  5  compares  the  FROC  curves  on 
FFDMs  and  SFMs  during  the  prescreening  stage.  The 
FROC  curves  were  generated  by  varying  the  number  of 
detected  suspicious  objects  per  image  based  on  the  rank¬ 
ing  of  local  maxima  on  the  gradient  field  images. 
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We  used  two  steps  for  FP  reduction  for  both  CAD 
systems.  The  first  step  was  the  rule-based  classification 
based  on  morphologic  features.  After  this  step,  there  were 
2,572  mass  candidates  (9.8  objects/image)  on  FFDMs  and 
3,654  mass  candidates  (13.9  objects/image)  on  SFMs 
without  additional  FNs  for  the  test  sets  of  262  images. 

The  second  step  was  the  LDA  classification.  A  total  of  16 
(4  global  texture  features,  7  local  texture  features,  and  5 
morphologic  features)  and  12  (4  global  texture  features,  4 
local  texture  features,  and  4  morphologic  features)  fea¬ 
tures,  respectively,  were  selected  from  the  two  indepen¬ 
dent  training  subsets  for  FFDMs.  The  feature  set  for 
SFMs  contained  a  total  of  21  features  (11  global  texture 
features,  7  local  texture  features,  and  3  morphologic  fea¬ 
tures),  as  obtained  from  previous  training. 

Figure  6  shows  the  comparison  of  the  average  test 
FROC  curves  of  the  CAD  systems  for  FFDMs  and  SFMs. 
The  FFDM  CAD  system  achieved  a  case-based  sensitivity 
of  70%,  80%,  and  90%  at  0.67,  1.15,  and  1.93  FPs/image, 
respectively,  compared  with  0.75,  1.06,  and  1.86  FPs/ 
image  for  the  SFM  CAD  system.  Because  two  trained 
CAD  systems  were  obtained  for  the  FFDMs  from  the 
cross-validation  training,  we  applied  each  of  the  trained 
systems  to  the  no-mass  data  set  for  FROC  analysis,  and 
estimated  the  number  of  FP  marks  per  image  on  the  no¬ 
mass  cases  at  each  decision  threshold.  For  each  trained 
CAD  system,  the  sensitivity  was  estimated  from  the  de¬ 
tected  masses  on  the  test  mass  subset  and  plotted  against 
the  FP  rate  estimated  from  the  no-mass  set.  Figure  7 
shows  the  average  FROC  curves  for  FFDMs  and  SFMs, 
similar  to  those  shown  in  Fig  6,  except  that  the  FP  rates 
were  estimated  from  the  no-mass  data  set. 

The  comparison  of  the  FROC  curves  for  the  FFDM 
and  SFM  CAD  systems  in  terms  of  the  area  under  the 
fitted  AFROC  curve  (Ay)  and  the  P  values  for  both  test 
mass  subsets  are  summarized  in  Table  2.  The  differences 
in  the  A ,  values  between  the  two  modalities  did  not 
achieve  statistical  significance  (P  >  .05).  The  fitted 
AFROC  curves,  however,  did  not  fit  very  well  to  the 
transformed  AFROC  data,  as  discussed  previously  (21). 
For  the  J AFROC  method,  Chakraborty  et  al  provided 
software  to  estimate  the  statistical  significance  of  the  dif¬ 
ference  between  two  FROC  curves.  The  comparison  of 
the  figure-of-merit  (FOM)  and  the  P  values  is  also  sum¬ 
marized  in  Table  2.  The  differences  in  the  FOMs  between 
the  FFDM  and  SFM  CAD  systems  again  did  not  achieve 
statistical  significance  (P  >  .05). 

There  were  27  malignant  cases  in  the  mass  set. 

Figure  8  compares  the  average  test  FROC  curves  of  the 
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Figure  6.  Comparison  of  the  average  test  tree-response  receiver 
operating  characteristic  (FROC)  curves  obtained  from  averaging 
the  FROC  curves  of  the  two  independent  mass  subsets  on  full- 
field  digital  mammograms  and  screen-film  mammograms.  The  FP 
rate  was  estimated  from  the  mammograms  with  masses,  (a)  Im¬ 
age-based  FROC  curves  and  (b)  case-based  FROC  curves. 


FFDM  and  SFM  CAD  systems  for  detection  of  malignant 
masses.  The  FP  rate  was  estimated  from  the  no-mass 
dataset.  In  this  case,  the  FFDM  CAD  system  achieved  a 
case-based  sensitivity  of  70%,  80%,  and  90%  at  0.37, 
0.73,  and  1.31  FP  marks/image,  respectively,  which  were 
substantially  better  than  the  FP  rates  of  1 . 1 ,  1.6,  and  2.0 
FP  marks/image  for  the  SFM  CAD  system.  However,  the 
difference  did  not  achieve  statistical  significance 
(P  >  .05). 

A  total  of  105  FFDM  cases  and  134  SFM  cases  were 
identified  as  BI-RADS  3  and  4  categories  by  an  MQSA 
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Figure  7.  Comparison  of  the  average  test  tree-response  receiver 
operating  characteristic  (FROC)  curves  obtained  from  averaging 
the  FROC  curves  of  the  two  independent  mass  subsets  on  full- 
field  digital  mammograms  and  screen-film  mammograms.  The 
FP  rate  was  estimated  from  the  mammograms  without  masses, 
(a)  Image-based  FROC  curves  and  (b)  case-based  FROC 
curves. 


radiologist  (Fig  4).  Of  these,  88  cases  (56  mass  cases  and 
32  no-mass  cases)  were  in  common.  Figure  9  compares 
the  average  test  FROC  curves  of  the  FFDM  and  SFM 
CAD  systems  for  detection  of  masses  only  on  this  com¬ 
mon  subset  of  dense  breasts.  The  FP  rate  was  estimated 
from  the  32  no-mass  dense  breasts.  Although  the  FROC 
curve  for  the  FFDMs  appears  to  be  slightly  higher  than 
that  of  the  SFMs,  the  difference  did  not  achieve  statistical 
significance  ( P  >  .05). 


DISCUSSION 


CAD  systems  have  been  proven  to  be  helpful  as  a  sec¬ 
ond  opinion  to  assist  radiologists  in  interpretation  of 
SFMs.  Recently  several  studies  have  been  conducted  to 
compare  FFDM  with  SFM  in  screening  cohorts  (1,4,5,34). 
These  clinical  trials  arrived  at  different  conclusions  about 
the  advantages  or  disadvantages  of  FFDM  in  comparison 
to  conventional  SFM  systems.  Some  of  the  differences 
may  be  attributed  to  factors  such  as  the  mammographic 
equipment,  the  study  design,  the  sample  sizes,  and  the 
reader  experience.  It  is  also  important  to  compare  the  per¬ 
formances  of  FFDM  and  SFM  CAD  systems.  In  our 
study,  we  compared  the  performance  of  the  two  systems 
on  pairs  of  FFDM  and  SFM  obtained  from  the  same  pa¬ 
tients  at  close  time  intervals. 

Several  FFDM  systems  have  been  approved  for  clini¬ 
cal  applications.  Because  digital  detectors  generally  have 
a  linear  response  to  x-ray  exposure,  the  raw  pixel  values 
are  a  linear  function  of  the  absorbed  x-ray  energy  in  the 
detector.  To  develop  a  CAD  system  that  is  less  dependent 
on  the  FFDM  manufacturer’s  proprietary  preprocessing 
methods,  we  used  the  raw  FFDM  as  input  to  our  CAD 
system.  Although  the  spatial  resolution  and  noise  proper¬ 
ties  of  the  images  from  different  detectors  were  still  dif¬ 
ferent,  the  use  of  raw  images  already  reduced  one  of  the 
major  differences  between  mammograms  from  different 
FFDM  systems.  For  preprocessing  of  the  raw  FFDMs,  we 
developed  a  multiresolution  enhancement  method.  From 
our  observation  on  the  SFMs  and  the  processed  FFDMs, 
the  breast  tissue  on  SFMs  appears  to  be  denser  than  that 
on  FFDMs  (35).  This  may  be  attributed  to  the  harder 
beam  quality  used  and  the  Laplacian  enhancement  on 
FFDMs.  In  this  study,  134  SFM  cases  were  rated  as 
BI-RADS  3  and  4  categories  by  an  MQSA  radiologist, 
whereas  only  105  FFDM  cases  were  rated  as  BI-RADS  3 
and  4.  When  the  FFDM  and  SFM  CAD  systems  were 
applied  to  the  small  common  subset  (56  with  masses  and 
32  without  masses)  of  dense  breasts  rated  as  BI-RADS  3 
and  4,  there  was  no  significant  difference  between  their 
average  test  FROC  curves  (Fig  9). 

The  overall  performances  of  the  CAD  systems  for  the 
two  modalities  did  not  demonstrate  significant  difference 
for  comparisons  in  either  the  subsets  or  the  entire  dataset. 
One  factor  may  be  the  substantially  smaller  number  of 
training  samples  used  for  the  FFDM  CAD  system  than 
that  for  the  SFM  CAD  system,  which  was  trained  with  a 
set  of  486  SFMs  in  a  previous  study  (19).  We  have 
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Table  2 

Estimation  of  the  Statistical  Significance  of  the  Difference  in  the  FROC  Performances  Between  the  FFDM  and  SFM  CAD  Systems 


A1  (AFROC) 

FOM  (JAFROC) 

All  Cases 

Malignant  Cases 

All  Cases 

Malignant  Cases 

Test  Subset  1 

Test  Subset  2 

Test  Subset  1 

Test  Subset  2 

Test  Subset  1 

Test  Subset  2 

Test  Subset  1 

Test  Subset  2 

FFDM 

0.48 

0.49 

0.51 

0.49 

0.47 

0.48 

0.55 

0.47 

SFM 

0.42 

0.43 

0.47 

0.42 

0.46 

0.41 

0.48 

0.42 

P  values 

.17 

.16 

.56 

.23 

.73 

.33 

.29 

.59 

The  FROC  curves  with  the  FP  marker  rates  obtained  from  the  no-mass  dataset  were  compared. 

FROC,  ;  FFDM,  full-field  digital  mammogram;  SFM,  screen-film  mammogram;  CAD,  computed-aided  detection;  AFROC,  alternative 
free-response  receiver  operating  characteristic;  FOM,  figure-of-merit;  JAFROC,  jackknife  free-response  ROC. 


Number  of  False  Positives  per  Image 

a. 


Number  of  False  Positives  per  Image 

b. 

Figure  8.  Comparison  of  the  average  test  free-response  receiver 
operating  characteristic  (FROC)  curves  of  computed-aided  detec¬ 
tion  systems  on  full-field  digital  mammograms  and  screen-film 
mammograms  for  mammograms  with  malignant  masses.  The  FP 
rate  was  estimated  from  the  mammograms  without  masses,  (a) 
Image-based  FROC  curves  and  (b)  case-based  FROC  curves. 


shown  previously  that  a  classifier  designed  with  a  larger 
number  of  training  samples  will  have  better  generalization 
to  unknown  test  cases  (36).  Furthermore,  because  our 
CAD  system  was  originally  developed  on  SFMs,  some  of 
those  techniques  used  may  favor  SFMs.  If  new  techniques 
are  designed  to  specifically  suit  the  properties  of  FFDMs, 
the  biases  may  be  reduced.  Further  investigations  are  un¬ 
derway  to  improve  the  FFDM  CAD  system. 

We  used  a  twofold  cross-validation  method  for  training 
and  testing  of  the  CAD  systems.  Feature  selection  and 
classifier  weight  design  were  performed  within  the  train¬ 
ing  subset  and  thus  were  independent  of  the  test  subset. 
Kupinski  et  al  (37)  showed  that  feature  selection  and  clas¬ 
sifier  weight  design  using  the  same  training  set  of  a  lim¬ 
ited  size  will  introduce  additional  optimistic  bias  to  the 
training  result  and  thus  additional  pessimistic  bias  to  the 
test  result.  Under  the  constraint  of  a  limited  training  set, 
the  relative  gain  or  loss  in  terms  of  bias  if  the  training  set 
is  further  split  into  two  subsets  for  separate  feature  selec¬ 
tion  and  classifier  weight  design  in  comparison  to  using 
the  entire  set  of  available  training  samples  for  both  pro¬ 
cesses  is  still  unknown.  The  relative  efficiency  of  differ¬ 
ent  resampling  techniques  in  utilization  of  a  limited  data¬ 
set  for  classifier  design  with  or  without  feature  selection 
remains  an  important  area  of  further  studies.  In  screening 
mammography,  the  cancer  rate  is  about  3-5  per  1,000. 
Most  of  the  mammograms  are  normal.  Therefore,  some 
CAD  researchers  and  users  estimate  the  FP  rate  using 
normal  mammograms  (38-40)  because  it  reflects  how  the 
CAD  system  performs  in  terms  of  specificity  in  a  screen¬ 
ing  setting.  Furthermore,  for  CAD  systems  that  set  a  max¬ 
imum  number  of  detected  objects  at  the  output,  estimating 
the  number  of  FPs  using  images  with  lesions  can  poten¬ 
tially  lead  to  an  optimistic  bias  for  the  FROC  curve  be¬ 
cause  one  of  the  detected  objects  will  likely  be  the  true 
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Number  of  False  Positives  per  Image 

b. 

Figure  9.  Comparison  of  the  average  test  tree-response  receiver 
operating  characteristic  (FROC)  curves  of  computed-aided  detec¬ 
tion  systems  on  full-field  digital  mammograms  and  screen-film 
mammograms  for  the  common  subset  of  56  dense  breasts  with 
masses  rated  as  BI-RADS  3  and  4.  The  FP  rate  was  estimated 
from  32  no-mass  dense  breasts  that  were  also  rated  as  BI-RADS 
3  and  4.  (a)  Image-based  FROC  curves  and  (b)  case-based 
FROC  curves. 


lesion.  The  FP  rate  can  thus  be  underestimated  by  as 
much  as  1  per  image.  In  addition,  the  JAFROC  analysis 
requires  that  the  FP  rates  be  estimated  on  normal  images. 
We  therefore  reported  the  FP  rates  of  our  CAD  systems 
on  both  mammograms  with  masses  and  without  masses  to 
facilitate  comparison  with  other  CAD  systems  in  case 
investigators  may  evaluate  their  FP  rates  in  either  way. 

Although  we  collected  case-matched  cases  for  compar¬ 
ing  the  performances  of  the  CAD  systems  for  FFDMs  and 
SFMs,  the  images  may  not  be  exactly  matched.  Variations 


from  positioning,  compression  force,  and  the  difference  in 
time  between  the  two  acquisitions  would  cause  differ¬ 
ences  in  the  subtlety  of  the  masses  on  the  FFDMs  and 
SFMs.  However,  assuming  that  the  differences  are  ran¬ 
dom,  both  datasets  would  include  images  that  have  better 
or  worse  positioning,  for  example,  than  that  on  the  other 
modality.  The  differences  in  the  various  factors  would 
likely  be  averaged  out  over  the  entire  dataset.  We  expect 
that  they  might  not  cause  substantial  bias  in  the  compari¬ 
son  of  the  relative  performances  of  the  CAD  systems  for 
the  two  modalities. 

For  a  CAD  system,  its  performance  for  detecting  ma¬ 
lignant  masses  is  more  important  than  its  performance  for 
detecting  all  masses.  We  only  have  27  malignant  cases  in 
this  dataset.  Although  the  FROC  curves  for  detection  of 
malignant  masses  (Fig  8)  indicated  that  the  FFDM  CAD 
system  had  a  higher  sensitivity  than  that  of  the  SFM 
CAD  system,  the  differences  in  the  A ,  and  the  FOM  did 
not  achieve  statistical  significance  (P  >  .05)  for  either 
test  subsets,  as  shown  in  Table  2.  A  large  dataset  is  being 
collected  for  further  comparison  of  the  FFDM  and  SFM 
CAD  systems  for  breast  cancer  cases. 

Conclusion 

We  compared  the  performance  of  our  CAD  systems 
for  detection  of  breast  masses  on  case-matched  FFDM 
images  and  SFM  images.  The  two  CAD  systems  used 
similar  computer  vision  techniques  but  their  preprocessing 
methods  were  different  and  the  FP  classifiers  were  sepa¬ 
rately  trained  to  adapt  to  the  image  properties  of  each 
modality.  From  the  comparison  of  FROC  curves,  it  was 
found  that  the  FFDM  CAD  system  achieved  higher  detec¬ 
tion  sensitivity  than  the  SFM  CAD  system  at  the  same  FP 
rates  for  malignant  cases.  However,  the  performances  of 
our  FFDM  and  SFM  CAD  systems  for  the  entire  data  set 
were  similar.  The  differences  between  the  two  modalities 
were  not  statistically  significant  with  both  AFROC  and 
JAFROC  methods  for  either  the  entire  dataset  or  the  ma¬ 
lignant  cases  alone.  Further  study  is  under  way  to  collect 
a  larger  dataset  and  to  improve  the  performances  of  both 
systems. 
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ABSTRACT 


We  have  developed  a  false  positive  (FP)  reduction  method  based  on  analysis  of  bilateral 
35  mammograms  for  computerized  mass  detection  systems.  The  mass  candidates  on  each  view  were 
first  detected  by  our  unilateral  computer-aided  detection  (CAD)  system.  For  each  detected  object,  a 
regional  registration  technique  was  used  to  define  a  region  of  interest  (ROI)  that  is  “symmetrical”  to 
the  object  location  on  the  contralateral  mammogram.  Texture  features  derived  from  the  spatial 
gray  level  dependence  (SGLD)  matrices  and  morphological  features  were  extracted  from  the  ROI 
40  containing  the  detected  object  on  a  mammogram  and  its  corresponding  ROI  on  the  contralateral 
mammogram.  Bilateral  features  were  then  generated  from  corresponding  pairs  of  unilateral  features 
for  each  object.  Two  linear  discriminant  analysis  (LDA)  classifiers  were  trained  from  the  unilateral 
and  the  bilateral  feature  spaces,  respectively.  Finally,  the  scores  from  the  unilateral  LDA  classifier 
and  the  bilateral  LDA  asymmetry  classifier  were  fused  with  a  third  LDA  whose  output  score  was 
45  used  to  distinguish  true  mass  from  false  positives  (FPs).  A  data  set  of  341  cases  of  bilateral  two- 
view  mammograms  was  used  in  this  study,  of  which  276  cases  with  552  bilateral  pairs  contained 
110  malignant  and  166  benign  biopsy-proven  masses  and  65  cases  with  130  bilateral  pairs  were 
normal.  The  mass  data  set  was  divided  into  two  subsets  for  2-fold  cross-validation  training  and 
testing.  The  normal  data  set  was  used  for  estimation  of  FP  rates.  It  was  found  that  our  bilateral  CAD 
50  system  achieved  a  case-based  sensitivity  of  70%,  80%,  and  85%  at  average  FP  rates  of  0.35,  0.75, 
and  0.95  FPs/image,  respectively,  on  the  test  data  sets  with  malignant  masses.  In  comparison  to  the 
average  FP  rates  for  the  unilateral  CAD  system  of  0.58,  1.33,  and  1.63,  respectively,  at  the 
corresponding  sensitivities,  the  FP  rates  were  reduced  by  40%,  44%,  and  42%  with  the  bilateral 
symmetry  information.  The  improvement  was  statistically  significance  (p<0.05)  as  estimated  by 
55  JAFROC  analysis. 

Keywords:  computer-aided  detection  (CAD),  bilateral  analysis,  mass  detection,  false  positive 
reduction. 
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I.  INTRODUCTION 


Breast  cancer  is  one  of  the  leading  causes  of  death  among  American  women  between  40  to  55  years 
of  age.1  It  has  been  reported  that  early  diagnosis  and  treatment  can  improve  significantly  the 
chance  of  survival  for  patients  with  breast  cancer.  '  Although  mammography  is  a  powerful 
screening  tool  for  detecting  breast  cancer,5’  6  studies  indicate  that  a  substantial  fraction  of  breast 

...'JO 

65  cancers  that  are  visible  upon  retrospective  analyses  of  the  images  are  not  detected  initially.  '  It  has 
been  shown  that  computer-aided  detection  (CAD)  can  increase  the  cancer  detection  rate  by 
radiologists  both  in  the  laboratory  and  in  clinical  practice.10'15 

In  screening  mammography,  two  mammographic  views,  cranio-caudal  (CC)  and  mediolateral 
oblique  (MLO)  views  are  generally  taken  of  each  breast.  During  mammographic  interpretation,  the 
70  radiologist  combines  complex  information  including  morphology,  texture,  and  geometric  location  of 
any  suspicious  structures  of  the  imaged  breast  from  different  views,  asymmetric  density  patterns 
between  bilateral  mammograms  of  the  same  view,  and  changes  between  the  current  and  the  prior 
mammograms  if  available.  Radiologists  have  found  that  these  techniques  are  effective  in  improving 
the  accuracy  of  detecting  subtle  lesions  and  reducing  false  positives  (FPs). 

75  Investigators  have  attempted  to  implement  the  multiple  image  techniques  in  CAD  systems  to 

improve  the  detection  accuracy  of  abnormalities  and  the  classification  accuracy  of  differentiating 
malignant  and  benign  lesions.  Hadjiiski  et  al. 1 6  developed  an  interval  change  analysis  of  masses  on 
current  and  prior  mammograms  and  found  that  the  classification  accuracy  of  masses  can  be 
improved  significantly  in  comparison  to  single  image  classification.  Paquerault  et  al.  developed  a 
80  two-view  (CC  and  MLO  views)  fusion  technique  to  reduce  FPs  in  mass  detection  and  obtained 
significant  improvement  by  comparing  to  their  one-view  detection  system.  Engeland  et  al. 18  recently 
presented  a  two-view  CAD  system  by  using  the  features  including  the  difference  in  the  radial 
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distance  from  the  candidate  regions  to  the  nipple,  the  gray  scale  correlation  between  both  regions, 
and  the  mass  likelihood  of  the  regions  determined  by  the  single  view  CAD  scheme.  Yin  et  a/.19  used 

85  bilateral  subtraction  in  a  prescreening  step  of  a  mass  detection  program  to  locate  mass  candidates, 

20 

but  the  subsequent  image  analysis  was  performed  based  only  on  a  single  view.  Mendez  et  al. 
developed  a  bilateral  CAD  system  based  on  a  bilateral  subtraction  approach  and  used  size  and 
eccentricity  tests  and  texture  features  to  eliminate  FPs.  Again,  the  bilateral  information  is  only  used 
to  find  the  suspicious  objects  and  the  subsequent  analysis  is  based  on  a  single  view. 

90  The  detection  of  masses  on  mammograms  is  a  challenging  task.  The  nonnal  fibroglandular 

tissue  in  the  breast  causes  FPs  by  mimicking  masses  and  causes  false  negatives  (FNs)  due  to 
overlapping  with  lesions.  In  order  to  improve  the  performance  of  our  mass  detection  system,  we 
are  investigating  computer-vision  methods  by  incorporating  information  from  two-view 
mammograms17  and  bilateral  mammograms,21  emulating  radiologists’  mammographic  interpretation 

95  techniques.  In  this  study,  we  will  discuss  our  approach  to  FP  reduction  by  analyzing  the  symmetry  or 
asymmetry  of  density  patterns  between  bilateral  mammograms. 

II.  MATERIALS  AND  METHODS 

A.  Data  Sets 

100  A  database  of  mammograms  was  collected  from  patient  files  at  the  Department  of  Radiology  with 
Institutional  Review  Board  (IRB)  approval.  The  mammograms  were  digitized  by  a  Lumiscan  laser 
scanner  with  a  pixel  size  of  50 fJm  x  50 (tm  and  12  bits  per  pixel.  The  pixel  size  was  increased  to 
lOOjUmxlOOjUm  by  averaging  every  2*2  adjacent  pixels  before  being  input  to  the  CAD  system.  In 
this  study,  two  data  sets  are  used:  a  mass  data  set  containing  bilateral  digitized  mammograms  with 
105  malignant  or  benign  masses  and  a  no-mass  data  set  containing  bilateral  digitized  mammograms 
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without  masses,  verified  by  an  experienced  radiologist.  All  cases  had  four  mammographic  views, 
the  CC  view  and  the  MLO  view  mammogram  for  both  breasts.  The  mass  set  and  the  no-mass  data 
set  contained  276  cases  (552  bilateral  pairs)  and  65  cases  (130  bilateral  pairs),  respectively,  yielding 
a  total  of  1364  mammograms.  The  mass  data  set  was  used  to  estimate  the  detection  sensitivity  and 
110  the  no-mass  data  set  was  used  for  estimating  the  FP  rate  (number  of  FPs  per  image).  In  the  mass  data 
set,  each  patient  had  a  biopsy-proven  mass  in  one  of  the  breasts,  resulting  in  a  total  of  276  masses, 
166  of  which  were  benign  and  110  malignant.  An  MQSA  radiologist  identified  the  location  of  the 
masses  based  on  all  available  diagnostic  and  clinical  information  of  the  case,  measured  the  mass 
sizes  as  the  longest  dimension  seen  on  the  two-view  mammograms,  provided  descriptors  of  the  mass 
115  shapes  and  mass  margins,  and  also  provided  an  estimate  of  the  breast  density  in  term  of  BI-RADS 
category.  Figure  1  shows  the  information  of  our  data  set  which  includes  the  distributions  of  mass 
sizes,  mass  shapes,  mass  margins,  and  breast  density. 

For  training  and  evaluation  of  the  perfonnances  of  the  CAD  systems,  the  cases  in  our  mass 
data  set  were  divided  into  two  independent  data  subsets  containing  136  and  140  cases,  respectively, 
120  for  two-fold  cross  validation  training  and  testing.  Of  the  136  cases  in  subset  1,  52  were  malignant 
and  84  were  benign.  Of  the  140  cases  in  subset  2,  58  were  malignant  and  82  were  benign.  The  no¬ 
mass  data  set  was  not  used  during  training.  All  260  mammograms  were  kept  as  independent  test 
samples  to  be  used  with  both  test  subsets. 

125  B.  Methods 

Our  bilateral  CAD  system  combines  unilateral  features  with  bilateral  features  to  reduce  FPs.  Similar 
structures  that  appear  in  both  right  and  left  mammograms  at  corresponding  locations  are  more  likely 
to  be  normal  tissue  than  masses,  whereas  asymmetric  density  may  indicate  a  developing  lesion.  The 
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key  of  this  system  is  therefore  the  design  of  a  classifier  that  can  differentiate  symmetry  and 
130  asymmetry  of  paired  ROIs  in  corresponding  regions  on  bilateral  mammograms  of  the  same  view. 
The  system  consists  of  four  steps:  (1)  mass  candidate  (MC)  localization,  (2)  corresponding  ROIs 
(CR)  registration,  (3)  feature  extraction  and  analysis,  and  (4)  bilateral  information  fusion.  Figure  2 
shows  the  block  diagram  for  our  bilateral  CAD  system.  The  detailed  description  for  each  step  is 
presented  below. 

135 

1.  Mass  Candidate  Localization 

Identification  of  mass  candidates  is  performed  by  the  following  two  steps:  breast  segmentation  and 
mass  candidate  detection.  The  breast  image  is  first  segmented  from  the  surrounding  image 
background  by  boundary  detection. 

22 

140  The  algorithm  developed  by  Zhou  et  al.  in  our  laboratory  is  used  to  track  the  breast 

boundary  and  segment  the  breast  from  the  background.  Mass  detection  is  performed  only  in  the 
breast  region.  We  have  previously  developed  a  mass  detection  system  for  unilateral 
mammograms."  '  The  system  is  used  for  mass  candidate  detection  in  the  current  study.  The 
system  performs  mass  detection  in  two  steps.  In  the  first  step,  a  gradient  field  analysis  method  is 
145  used  to  detennine  the  seeds  of  mass  candidates  followed  by  a  region  growing24  method  to  segment 
the  mass  candidates  starting  from  those  seeds.  In  the  second  step,  the  gradient  convergence  is 
calculated  using  the  gray  levels  and  the  shape  of  the  segmented  mass  region  as  a  priori  information. 
The  mass  candidates  that  pass  the  gradient  convergence  criterion  are  retained  for  further  analysis  in 
the  bilateral  system.  Figure  3  shows  an  example  of  mass  candidates  detected  on  a  mammogram. 
150  Figure  3(a),  3(b),  and  3(c)  show  the  original  image,  detected  breast  boundary,  and  the  detected  mass 
candidates,  respectively. 
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2.  Corresponding  ROI  Registration 

For  each  mass  candidate,  its  corresponding  ROI  on  the  contralateral  mammogram  is  identified  by  the 
155  regional  registration  technique  developed  previously  in  our  laboratory16  with  a  modification  to 
handle  the  special  case  when  the  distance  between  the  nipple  location  and  the  center  of  an  ROI  is  too 
small  to  obtain  the  intersection  points  on  the  breast  boundary.  The  nipple  location  on  each  image 
was  manually  identified  so  that  the  effectiveness  of  the  bilateral  analysis  method  could  be  evaluated 
independent  of  nipple  detection  errors. 

160  The  original  region  registration  technique  included  the  following  steps.  The  registration  is 

performed  in  a  polar  coordinate  system  where  the  origin  is  located  at  the  nipple  location  of  a  breast 
image.  Figure  4  shows  an  example  of  locating  the  corresponding  ROI  of  a  mass  candidate  on  the 
contralateral  mammogram.  Using  the  distance  r  from  the  nipple  o  to  the  center  of  the  mass  as  the 
radius,  an  arc  centered  at  the  origin  (nipple)  is  drawn.  The  arc  will  intersect  the  mass  candidate  and 

165  the  breast  boundary  at  two  points,  p  and  q.  The  angle  between  om  and  op  is  defined  as  0 ,  the 

angle  between  op  and  oq  is  defined  as  a.  On  the  contralateral  mammogram,  the  corresponding 

ROI  m  ’  is  localized  with  a  similar  procedure.  An  arc  of  radius  r  centered  at  the  nipple  o  ’  of  the 
contralateral  mammogram  is  drawn.  The  intersections  of  the  arc  with  the  breast  boundary  are  p  ’ 

and  q’.  The  angle  between  o' p'  and  o' q'  is  defined  as  a’.  The  location  of  the  corresponding 

170  ROI  as  determined  by  the  angle  O'  between  o'  p'  and  the  radius  o'  in'  is  estimated  as  Oa'  /  a. 

The  coordinate  of  the  center  of  the  corresponding  ROI  is  therefore  given  by  (r,  O'). 

For  some  special  cases  that  the  nipple  is  located  within  the  breast,  not  on  the  breast  boundary 
(referred  to  as  an  inward  nipple),  our  original  regional  registration  method  may  fail  since  the 
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distance  from  the  nipple  to  the  mass  candidate  can  be  too  short  to  obtain  two  intersection  points  on 
175  the  breast  boundary.  In  order  to  handle  those  special  cases,  the  new  origin  will  be  derived  by 
horizontally  shifting  the  origin  of  the  polar  coordinate  system  toward  the  breast  boundary  until  the 
intersection  on  the  breast  boundary  is  reached.  In  this  way,  the  radius  can  be  roughly  detennined 
such  that  the  corresponding  ROI  location  can  be  estimated.  Figure  5  shows  an  example  of  the 

modified  regional  registration.  Figure  5(a)  shows  an  example  that  the  distance  om  between  the 
180  nipple  and  a  mass  candidate  is  too  small  to  obtain  two  intersection  points  at  the  breast  boundary. 
After  horizontally  shifting  the  origin  from  o  to  n  in  Figure  5(a)  and  the  origin  from  o  ’  to  n  ’  in  Figure 
5(b),  the  location  of  the  corresponding  ROI  m  ’  is  estimated  based  on  the  new  origins  using  the 
regional  registration  technique  as  described  above. 

185  3.  Feature  Extraction  and  Analysis 

3.1  Feature  Extraction 

For  the  features  analysis,  two  types  of  features,  SGLD  (spatial  gray-level  dependence)  texture 
features  and  morphological  features  are  extracted  from  both  the  ROI  containing  the  detected  mass 
candidate  and  its  contralateral  ROI. 

190  For  the  SGLD  features,  thirteen  texture  measures24'26  are  extracted  from  the  entire  ROI  (referred  to 
as  the  global  texture  features)  at  14  distances  and  2  angles  with  a  total  of  364  (13x14x2)  features. 
The  same  13  texture  measures  are  extracted  from  the  central  region  containing  the  detected  object 
and  the  peripheral  regions  within  each  ROI  (referred  to  as  the  local  texture  features)  at  4  distances 
and  2  angles  with  a  total  of  104  (13x2x4)  features  from  the  central  region  and  104  features  as  the 
195  difference  of  the  corresponding  features  in  the  central  and  the  peripheral  regions.25 


24  25 

Twelve  morphological  features  are  extracted  from  the  object  segmented  within  the  ROI.  ’ 
Five  of  them  are  based  on  the  normalized  radial  length  (NRL),  defined  as  the  Euclidean  distance 
from  the  object  centroid  to  each  of  its  edge  pixels  and  normalized  relative  to  the  maximum  radial 
length  for  the  object.  In  our  previous  studies,  we  found  that  the  mean,  standard  deviation,  entropy, 
200  area  ratio,  and  zero  crossing  count  features  derived  from  the  NRL  are  useful  for  discriminating 
between  objects  containing  masses  and  normal  tissue.  The  other  six  morphological  features  are  the 
perimeter,  area,  perimeter-to-area  ratio,  circularity,  rectangularity,  and  contrast  of  the  object.  The 
last  morphological  feature  is  the  summary  Fourier  descriptor  measure,-  which  is  obtained  from  the 
Fourier  transform  of  the  object  boundary  sequence.  Objects  with  more  irregular  contours  have  more 
205  high-frequency  components  than  those  with  smooth  contours.29 

3.2  Unilateral  CAD  System 

The  unilateral  LDA  classifier  uses  only  the  SGLD  texture  features  as  input  predictor  variables  as 
described  previously.-  The  stepwise  LDA  feature  selection  strategy  with  simplex  optimization 
210  was  used  to  select  the  best  texture  feature  subset  and  reduce  the  dimensionality  of  the  feature  space. 
Two-fold  cross  validation  was  used  to  train  and  test  the  CAD  systems,  as  discussed  below.  For 
each  of  the  two  cross  validation  cycles,  the  algorithm  used  a  leave-one-case-out  resampling  method 
and  simplex  optimization  within  the  training  subset  to  estimate  the  best  threshold  values,  F;n,  Fout, 
and  tolerance,  based  on  the  F  statistics  for  stepwise  feature  selection.  The  chosen  Fin,  Fout  and 
215  tolerance  values  are  then  used  to  select  a  set  of  features  and  the  weights  for  the  LDA  classifier  are 
estimated  from  the  training  subset.  The  test  subset  was  thus  independent  of  the  classifier  training  in 

23 

each  cross-validation  cycle.  This  procedure  has  been  described  in  more  details  previously. 
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4.  Bilateral  Information  Fusion 


220  4.1  Bilateral  LDA  Classifier 

The  bilateral  LDA  classifier  incorporates  the  “symmetry”  information  on  the  left  and  right  breasts 
to  differentiate  symmetric  (likely  FPs)  and  asymmetric  (likely  masses)  structures.  Bilateral  features 
are  derived  from  the  unilateral  SGLD  texture  features  and  the  morphological  features  for  each  pair 
of  ROIs  -  a  detected  mass  candidate  and  its  corresponding  ROI,  using  the  following  relationship: 

Max  ( MC[i ,  j],  CR[i ,  j ] ) 
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BF[i,j]  = 


Min  (MC[i,j\,  CR[i,j ] ) 


(1) 


where  MC[z',y]  and  CR[i,  j]  are  the  i  feature  of  the  j  '  mass  candidate  and  the  i  feature 


of  the  j  corresponding  ROI,  respectively.  The  bilateral  LDA  classifier  was  trained  in  a  similar 
way  as  that  for  the  unilateral  LDA  classifier,  as  described  above. 


230  4.2  Bilateral  CAD  System 

In  the  last  stage,  the  discriminant  scores  of  the  unilateral  and  bilateral  classifiers  are  merged  by  a 
third  LDA.  The  weights  of  this  LDA  classifier  were  also  trained  with  the  training  subset.  The 
output  score  from  the  third  LDA  is  used  to  differentiate  TPs  from  FPs  in  the  bilateral  CAD  system. 


235  5.  Evaluation  Methods 

The  detected  individual  objects  were  compared  with  the  true  mass  location  marked  by  an 
experienced  radiologist.  An  object  was  considered  to  be  a  true  positive  (TP)  if  the  overlap  between 
the  detected  object  and  the  true  mass  was  greater  than  25%.  The  25%  threshold  was  selected  as 
described  in  our  previous  study.30 
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240  To  evaluate  the  performance  of  our  bilateral  LDA  classifier,  the  test  discriminant  scores  were 

analyzed  using  receiver  operating  characteristic  (ROC)  methodology.31  The  accuracy  for 
classification  of  mass  and  nonnal  tissue  was  evaluated  as  the  area  under  the  ROC  curve,  Az . 

The  detection  performance  of  the  bilateral  CAD  system  was  assessed  by  free  response  ROC 
(FROC)  analysis.  An  FROC  curve  shows  the  relationship  between  the  detection  sensitivity  and  the 
245  FP  rate  as  the  decision  threshold  varies.  FROC  curves  were  presented  on  a  per-image  and  a  per- 
case  basis.  For  image-based  FROC  analysis,  the  mass  on  each  mammogram  was  considered  an 
independent  true  object.  For  case-based  FROC  analysis,  the  same  mass  imaged  on  the  two-view 
mammograms  was  considered  to  be  one  true  object  and  detection  of  the  masses  on  either  view  or  on 
both  views  was  considered  to  be  a  TP  detection. 

250  Two  sets  of  trained  parameters  were  acquired  as  a  result  of  the  2-fold  cross  validation  training. 

To  estimate  the  FP  rate  on  normal  mammograms  when  the  trained  CAD  system  is  used  in  a 
screening  setting,  we  applied  the  trained  unilateral  and  bilateral  systems  to  the  260  no-mass 
mammograms  for  independent  testing.  The  number  of  FP  marks  produced  by  the  algorithm  was 
estimated  by  counting  the  detected  objects  on  these  normal  cases  only.  The  mass  sensitivity  was 
255  determined  by  counting  only  the  masses  on  the  corresponding  test  mass  subset.  The  combination  of 
the  sensitivity  from  the  test  mass  subset  and  the  FP  rate  from  the  normal  data  set  at  the 
corresponding  detection  thresholds  resulted  in  a  test  FROC  curve.  The  training  and  testing  procedure 
were  performed  for  each  cycle  of  the  two-fold  cross  validation  process,  thereby  generating  two  test 
FROC  curves.  To  estimate  the  overall  performance  of  the  CAD  system,  an  average  test  FROC  curve 
260  is  obtained  by  averaging  the  FP  rates  from  the  FROC  curves  of  the  two  mass  subsets  at  the 
corresponding  sensitivities. 
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Chakraborty  et  al.  proposed  a  JAFROC  method  and  provided  software  to  estimate  the 
statistical  significance  of  the  difference  between  two  FROC  curves.  We  employed  the  JAFROC 
analysis  to  evaluate  the  difference  in  the  FROC  curves  obtained  from  the  unilateral  CAD  system  and 
265  the  bilateral  CAD  system. 

III.  RESULTS 

A.  Bilateral  Feature  Analysis 

Figures  6  and  7  show  examples  of  detection  results  obtained  from  the  unilateral  system  and  the 
270  bilateral  system.  Figure  6  shows  a  mass  that  was  initially  detected  as  a  mass  candidate  but  was 
excluded  in  the  false  positive  reduction  steps,  and  was  therefore  an  FN  of  the  unilateral  CAD  system. 
The  bilateral  analysis  increased  the  likelihood  score  of  this  mass.  It  was  therefore  not  excluded  in  the 
false  positive  reduction  steps  and  became  a  TP  in  the  bilateral  CAD  system. 

Figure  7  shows  an  example  of  an  FP  detected  by  the  unilateral  CAD  system.  The  FP  was 
275  excluded  in  the  bilateral  system  because  it  was  found  to  have  high  symmetry  with  the  tissue  in  the 
contralateral  breast,  as  shown  in  the  ROI  in  Figure  7(d),  by  the  bilateral  analysis. 

B.  Performance  Evaluation 

In  the  prescreening  process,  we  obtained  a  large  number  of  mass  candidates  on  each  mammogram. 
280  Each  mass  candidate  was  paired  with  a  corresponding  ROI  in  the  contralateral  breast.  A  total  of 
3127  and  3402  mass  candidates  were  extracted  for  training  subset  1  and  subset  2,  respectively, 
which  included  98.5%  (134/136)  and  99.3%  (139/140)  of  the  masses  in  the  two  subsets.  The  mass 
candidates  in  the  unilateral  mammograms  and  the  ROI  pairs  from  bilateral  mammograms  in  the 
training  subset  were  used  to  design  the  unilateral  and  bilateral  classifiers  in  each  of  the  2-fold  cross- 
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285  validation  cycles.  The  most  effective  subset  of  features  from  the  available  feature  pool  was 
selected  for  each  of  the  training  subsets  during  the  training  procedure.  For  the  unilateral  LDA 
classifier,  twenty  (11  global  and  9  local)  and  nineteen  (12  global  and  7  local)  texture  features  were 
selected  from  the  two  independent  training  subsets,  respectively.  For  the  bilateral  LDA  classifier, 
twenty-four  (11  global  texture,  9  local  texture  and  4  morphological)  and  twenty-three  (12  global,  8 
290  local,  and  3  morphological)  features  were  selected  from  the  two  independent  training  subsets, 
respectively.  The  validation  Az  values  of  the  LDA  classifier  during  the  leave-one-case-out  training 
were  0.846  ±0.011  and  0.832  ±0.009,  respectively,  for  the  two  training  subsets  using  the  unilateral 
LDA  classifier,  and  were  0.862  ±0.015  and  0.859  ±0.012,  respectively,  using  the  bilateral  LDA 
classifier.  The  classifiers  achieved  Az  values  of  0.833  ±0.015  and  0.831  ±0.011,  respectively,  for 
295  the  two  test  subsets  using  the  unilateral  LDA  classifier,  and  0.853  ±0.013  and  0.849  ±0.011, 
respectively,  using  the  bilateral  LDA  classifier. 

Figure  8  shows  the  average  test  FROC  curves  for  the  unilateral  and  bilateral  CAD  systems 
after  FP  reduction  with  the  corresponding  trained  LDA  classifiers  when  the  FP  rates  were  estimated 
from  the  test  subsets  with  masses.  Figure  9  shows  the  corresponding  results  when  the  FP  rates  were 
300  estimated  on  the  set  of  no-mass  mammograms.  Table  I  summarizes  the  average  FP  rates  estimated 
with  both  the  mass  and  no-mass  data  sets  at  several  case-based  sensitivities. 

Because  the  detection  perfonnance  of  CAD  systems  on  cancer  cases  is  of  prime  importance, 
we  analyzed  the  performance  of  our  CAD  systems  for  the  subset  of  cases  containing  malignant 
masses.  Figure  10  compares  the  average  test  FROC  curves  for  the  unilateral  and  bilateral  CAD 
305  systems  on  malignant  cases  only.  Figure  1 1  shows  the  average  test  FROC  curves  for  the  unilateral 
and  bilateral  CAD  systems  with  the  sensitivities  estimated  on  malignant  cases  only  and  the  FP  rates 
estimated  on  the  set  of  no-mass  mammograms.  The  bilateral  CAD  system  achieved  a  case-based 
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sensitivity  of  70%,  80%,  and  85%  at  average  FP  rates  of  0.35,  0.75,  and  0.95  FPs/image, 
respectively,  on  the  test  subset  of  malignant  masses.  In  comparison  to  the  average  FP  rates  for  the 
310  unilateral  CAD  system  of  0.58,  1.33,  and  1.63  FPs/image,  respectively,  at  the  corresponding 
sensitivities,  the  FP  rates  were  reduced  by  40%,  44%,  and  42%  with  the  bilateral  symmetry 
information.  Table  II  summarizes  the  average  FP  rates  estimated  with  both  the  mass  and  no-mass 
data  sets  for  cases  with  malignant  masses  only  at  several  case-based  sensitivities. 

The  ligure-of-merit  (FOM)  from  the  output  of  the  JAFROC  software  is  summarized  in  Table 
315  111(a)  for  all  cases  and  in  Table  111(b)  for  malignant  cases  only.  The  difference  between  the  FOMs  for 

the  unilateral  and  the  bilateral  CAD  systems  was  statistically  significant  (/;<().  05)  for  all 
comparisons. 


IV.  DISCUSSION 

320  Symmetry  between  breast  structures  in  bilateral  pairs  of  mammograms  is  an  important  feature  used 
by  radiologists  for  mass  detection  or  FP  reduction.  Similar  structures  that  appear  in  both  right  and 
left  mammograms  are  more  likely  to  be  normal  tissue  than  abnonnal  lesions.  Our  bilateral  analysis 
translates  this  radiologists’  knowledge  to  computer  vision  techniques  so  that  the  CAD  system  can 
utilize  the  symmetry  of  breast  tissue  on  bilateral  mammograms  to  improve  detection  accuracy.  The 
325  results  of  our  study  show  that  the  bilateral  infonnation  is  an  effective  technique  for  reducing  FPs. 

The  bilateral  features  are  important  factors  affecting  the  performance  of  the  bilateral  LDA 
classifier.  In  this  study,  the  bilateral  features  were  derived  from  features  extracted  from  each  pair  of 
ROIs,  i.e.,  the  mass  candidate  and  its  corresponding  ROI,  using  the  maximum-to-minimum  ratio 

strategy  as  shown  in  Eq.  (1).  We  also  investigated  if  other  strategies,  including  BF\i  /l  =  M  ^  , 

c//[/,./i 
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MC[i,j]-CR[i,j] 


330  BF\i  /1=  ,  and  /I  =  A — CR\}J] — ;  could  improve  the  perfonnance  of 

MC[i,j ]  (MC[i,j]  +  CR[i,j])/2 

the  bilateral  CAD  system.  It  was  found  that  these  strategies  are  not  as  effective  as  the  maximum- 
to-minimum  ratio.  Specifically,  among  the  A.  values  of  all  bilateral  features,  72%  of  those  from  the 

latter  strategies  are  lower  than  those  of  their  corresponding  features  obtained  by  Eq.  (1).  The  advantage  of 
using  bilateral  symmetry  measures  defined  by  the  maximum-to-minimum  ratio  can  be  seen  by  considering  the 
335  following  example:  assuming  two  ROI  pairs  that  are  highly  asymmetric,  (  MCX  , CRX )  and  ( MC2,CR2 ),  in 
which  MCX  >  CRX  and  MC2  <  CR2 ,  their  bilateral  features  derived  as  the  maximum-to-minimum  ratio  will 

both  be  greater  than  1 .  However,  the  bilateral  features  obtained  from  BF\i,  j]  =  ^  will  be  greater  than 

CR[i,j] 

1  for  (  MCX,CRX  )  but  smaller  than  1  for  (  MC2,CR1  ).  The  bilateral  measures  obtained  from 
MC\i,j]  CR[i,j ]  Qr  BF[i,  j]  =  — MC[i,j] — CR[i,j] —  wj||  be  positive  for  ( MCX,CRX )  but 
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BF[i,j]  =  - 


MC[i,j\ 


(MC[i,j]  +  CR[i,j])/2 
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negative  for  ( MC2,CR2 ).  The  bilateral  feature  defined  in  Eq.  (1)  therefore  describes  the  asymmetry 
between  the  ROI  pairs,  regardless  which  ROI  has  a  larger  feature  value,  whereas  the  other  three  bilateral 
features  do  not  consistently  provide  feature  values  in  the  same  direction.  The  maximum-to-minimum  ratio 
approach  can  thus  achieve  better  performance  than  the  other  three  strategies. 

The  corresponding  ROI  registration  is  an  important  procedure  in  the  bilateral  analysis.  The  two 
breasts  of  a  given  patient  are  not  perfectly  symmetrical  and  other  factors  such  as  positioning  and 
compression  further  introduce  variability  in  the  symmetry.  We  investigated  the  effect  of  variability 
in  the  registered  ROI  locations  on  bilateral  analysis.  For  this  purpose,  the  pre-screening  step  of  our 
unilateral  CAD  system  was  first  applied  to  the  contralateral  mammogram  to  locate  the  mass 
candidates.  For  a  given  ROI  predicted  by  the  registration  method  on  the  contralateral  mammogram, 
its  location  was  compared  to  the  ROI  locations  of  these  mass  candidates  by  evaluating  an  overlap 
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ratio,  defined  as  the  intersection  between  the  predicted  ROI  and  a  mass  candidate  ROI  relative  to  the 
area  of  the  smaller  ROIs.  If  the  overlap  ratio  of  the  predicted  ROI  with  a  mass  candidate  ROI  was 
greater  than  a  chosen  threshold,  the  location  of  the  predicted  ROI  would  be  changed  to  the  location 
of  the  mass  candidate  ROI.  If  the  predicted  ROI  overlapped  with  more  than  one  mass  candidate 
355  ROIs,  the  mass  candidate  ROI  having  the  largest  overlap  ratio  that  exceeded  the  threshold  would  be 
used.  We  evaluated  the  effects  of  this  ROI  location  adjustment  for  a  range  of  thresholds.  It  was 
found  that  when  the  overlap  ratio  threshold  was  chosen  to  be  about  0.7  to  0.9,  the  perfonnance  of  the 
bilateral  CAD  system  would  have  a  small  but  insignificant  improvement  compared  to  the  bilateral 
CAD  system  without  the  ROI  adjustment  process.  When  the  overlap  ratio  threshold  was  smaller  than 
360  0.5,  the  performance  of  the  bilateral  CAD  system  was  degraded.  This  study  indicated  that  small 

variability  of  the  predicted  ROI  location  on  the  contralateral  mammogram  does  not  have  a  strong 
effect  on  the  performance  of  the  bilateral  analysis. 

Various  registration  methods  have  been  attempted  for  registration  of  mammograms  of  the 
same  breast.  For  example,  the  warping  approach  proposed  by  Sallam  et  al.  ,  and  the  multiple- 
365  control-point  approach  proposed  by  Vujovic  et  al.34.  Those  approaches  depended  on  the 
identification  of  corresponding  control  points.  However,  there  are  few,  if  any,  invariant  landmarks 
on  mammograms  that  can  be  identified  automatically  because  the  breast  is  composed  of  soft  tissue. 
The  projected  image  of  the  breast  tissue  often  changes  even  when  the  same  breast  is  compressed  two 
different  times.  It  is  even  more  variable  between  a  breast  and  its  contralateral  breast.  Commonly 
370  used  rigid  or  non-rigid  registration  methods  will  not  be  appropriate  for  this  application.  We 
therefore  developed  the  regional  registration  method  for  correlation  of  ROIs  on  mammograms.  Our 
regional  registration  method  uses  the  nipple  and  the  distance  between  the  nipple  and  the  ROI  center 
to  be  the  relatively  invariant  infonnation.  The  lesion  in  the  target  breast  is  estimated  to  be  located 
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within  a  band  of  tissue  centered  along  the  arc  traced  using  the  nipple-to-lesion  distance  as  the  radius 
and  with  the  origin  at  the  nipple.  This  method  emulates  a  technique  used  by  many  radiologists  in 
identifying  corresponding  lesions  in  two-view  mammograms  or  current  and  prior  mammograms. 
Van  Engeland  et  al.  compared  methods  for  mammogram  registration  based  on  breast  alignment 
and  linear  and  non-linear  warping.  They  concluded  that  linear  warping  using  mutual  information 
performed  better  than  the  other  methods.  We  also  perfonned  a  study  comparing  our  regional 
380  registration  method  to  correlation  or  mutual  information  based  linear  and  non-linear  warping 
methods  using  a  data  set  of  390  current  and  prior  mammogram  pairs  .  Our  results  showed  that  the 
regional  registration  method  outperformed  the  warping  approaches  in  identifying  corresponding 
lesions  on  the  mammogram  pairs.  The  localization  of  symmetric  ROIs  on  the  bilateral  breasts  is 
similar  to  the  problem  of  registering  ROIs  on  current  and  prior  mammograms.  We  therefore 
385  adapted  the  regional  registration  method  to  the  bilateral  analysis  in  this  study. 

To  implement  the  bilateral  analysis  in  a  practical  CAD  system,  the  nipple  locations  have  to 
be  detected  automatically.  We  have  previously  developed  a  nipple  detection  algorithm  to 
determine  the  nipple  location  on  a  mammogram.  The  algorithm  could  detect  the  nipple  locations 
within  1  cm  of  the  manually  identified  locations  in  about  70%  of  the  images  in  the  data  set  used  in 
390  this  study.  A  large  deviation  of  the  nipple  location  from  the  true  location  may  affect  the  regional 
registration  technique  in  locating  the  symmetric  ROI  on  the  contralateral  mammogram,  which  in  turn 
may  degrade  the  performance  of  the  bilateral  analysis  of  tissue  symmetry.  We  therefore  used  the 
manually  identified  nipple  locations  in  this  study  in  order  to  develop  the  bilateral  classifier  without 
the  influence  of  other  confounding  factors.  Further  work  is  underway  to  improve  the  nipple  detection 
395  algorithm  and  to  investigate  the  effect  of  nipple  detection  accuracy  on  the  performance  of  the 
bilateral  system. 
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The  inward  nipple  projection  is  often  a  result  of  positioning  and  compression  problems  so 
that  the  nipple  is  not  projected  in  profile.  Since  there  is  not  enough  information  from  the  2D 
projected  mammograms  to  correct  for  the  defonnation  of  the  breast,  we  designed  a  simple,  ad  hoc 
400  correction  method  to  allow  the  arc  drawn  using  the  nipple-to-mass  distance  as  the  radius  to  intersect 

the  breast  boundary.  In  these  cases,  the  breast  image  on  the  bilateral  mammogram  often  does  not 
have  a  similar  positioning  problem  and  the  difference  in  the  compression  of  the  two  breasts  may 
cause  large  uncertainty  in  the  registration  regardless  of  the  correction  method.  For  cases  in  which 
both  breasts  actually  have  inward  nipples  and  the  breast  images  are  similar,  our  correction  method 
405  will  not  cause  additional  errors  because  similar  correction  will  be  applied  to  the  bilateral 
mammograms  and  symmetric  ROIs  will  be  identified  on  the  mammograms. 

Our  motivation  of  this  study  is  to  reduce  the  FPs  of  a  CAD  system  for  mass  detection.  A 
CAD  detection  system  is  generally  intended  for  use  in  screening  mammography.  At  the  screening 
stage,  all  lesions  of  concern  should  be  pointed  out  to  radiologists  so  that  the  radiologists  can  judge 
410  whether  a  recall  is  warranted.  If  a  detection  system  is  trained  to  mark  only  the  malignant  lesions,  it 

may  be  attempting  to  play  the  role  of  a  triage  system  (alerting  radiologists  to  work  up  only 
“malignant”  cases)  rather  than  that  of  a  second  reader.  Furthermore,  since  computerized  lesion 
detection  or  characterization  on  mammograms  is  not  100%  sensitive,  it  will  be  confusing  to  the 
radiologists  whether  an  unmarked  suspicious  lesion  is  missed  or  it  is  considered  benign  by  the 
415  computer.  We  believe  that  computer-aided  diagnosis  (CADx)  may  be  used  in  different  ways  in 
conjunction  with  a  CAD  detection  system.  For  example,  the  likelihood  of  malignancy  may  be 
estimated  by  the  CADx  system  and  displayed  for  every  detected  lesion,  and/or  a  CADx  system  may 
be  used  during  diagnostic  workup.  Either  way  the  CAD  system  will  first  alert  radiologists  to  all 
masses,  leaving  the  assessment  of  malignancy  or  benignity  to  a  second  stage.  We  therefore 
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420  included  both  malignant  and  benign  masses  in  the  training  sets  to  train  the  system  to  detect  all 
masses. 

V.  CONCLUSIONS 

We  developed  an  FP  reduction  method  to  improve  computerized  mass  detection  on  mammograms 
425  based  on  analysis  of  bilateral  information.  It  was  found  that  the  false  positives  can  be  reduced  by 
training  a  new  classifier  for  bilateral  features  and  combining  its  output  score  with  the  unilateral 
classifier  score.  The  bilateral  CAD  system  achieved  a  case-based  sensitivity  of  70%,  80%,  and  85% 
for  detection  of  malignant  masses  at  average  FP  rates  of  0.35,  0.75,  and  0.95  FPs/image,  respectively, 
on  the  test  data  set.  In  comparison  to  the  average  FP  rates  for  the  unilateral  CAD  system  of  0.58, 
430  1.33,  and  1.63  FPs/image,  respectively,  at  the  corresponding  sensitivities,  the  FP  rates  were  reduced 

by  40%,  44%,  and  42%  with  the  bilateral  symmetry  information.  The  improvement  in  the  overall 
detection  accuracy  is  statistically  significant  (p<0.05)  by  JAFROC  analysis.  Our  results  demonstrate 
that  the  bilateral  analysis  can  differentiate  the  similarity  and  dissimilarity  between  tissues  at 
corresponding  locations  in  the  bilateral  views,  and  is  useful  for  improving  the  performance  of  a 
435  unilateral  CAD  system  by  further  reducing  the  FPs. 
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TABLE  I.  The  average  FP  reduction  rates  at  case-based  sensitivities  of  70%,  80%,  and  85%  for  the 
test  subsets  when  the  FP  rates  were  estimated  from  the  mass  and  no-mass  data  sets. 


FP  rate  estimated  from 
mass  data  set 

FP  rate  estimated  from 
no-mass  data  set 

Unilateral 

CAD 

Bilateral 

CAD 

FP 

Reduction 

Unilateral 

CAD 

Bilateral 

CAD 

FP 

Reduction 

70% 

0.70 

0.53 

24% 

0.86 

0.53 

38% 

80% 

1.10 

0.87 

21% 

1.32 

1.04 

21% 

85% 

1.46 

1.15 

21% 

1.72 

1.32 

23% 
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TABLE  II.  The  average  FP  reduction  rates  for  cases  with  malignant  masses  at  case-based 
sensitivities  of  70%,  80%,  and  85%  for  the  test  subsets  when  the  FP  rates  were  estimated  from  the 
mass  and  no-mass  data  sets. 


FP  rate  estimated  from 
mass  data  set 

FP  rate  estimated  from 
no-mass  data  set 

Unilateral 

CAD 

Bilateral 

CAD 

FP 

Reduction 

Unilateral 

CAD 

Bilateral 

CAD 

FP 

Reduction 

70% 

0.43 

0.33 

23% 

0.58 

0.35 

40% 

80% 

0.78 

0.62 

21% 

1.33 

0.75 

44% 

85% 

0.94 

0.78 

17% 

1.63 

0.95 

42% 
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TABLE  III.  Estimation  of  the  statistical  significance  in  the  difference  between  the  FROC 
performance  of  the  unilateral  and  bilateral  CAD  systems  on  test  subsets  1  and  2.  The  FP  rates  of 
the  FROC  curves  were  estimated  from  the  no-mass  data  set:  (a)  all  cases,  and  (b)  malignant 
cases. 


(a) 


FOM 

(JAFROC) 

Test 
subset  1 

Test 
subset  2 

Unilateral  CAD 

0.52 

0.48 

Bilateral  CAD 

0.58 

0.51 

p  value 

<0.001 

0.008 

(b) 


FOM 

(JAFROC) 

Test 
subset  1 

(malignant  only) 

Test 
subset  2 

(malignant  only) 

Unilateral  CAD 

0.56 

0.53 

Bilateral  CAD 

0.61 

0.56 

p  value 

0.009 

0.003 
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FIGURE  CAPTIONS 


Figure  1.  The  characteristics  of  our  mass  data  set.  (a)  distribution  of  mass  sizes,  (b)  distribution  of 
mass  shapes,  (c)  distribution  of  mass  margins,  C:  circumscribed,  Ind:  indistinct,  M:  micro lobulated, 
580  Ob:  obscured,  Sp:  spiculated,  (d)  distribution  of  the  breast  density  in  tenns  of  BI-RADS  category 
estimated  by  an  MQSA  radiologist. 

Figure  2.  Block  diagram  of  the  bilateral  CAD  system  for  mass  detection  on  mammograms. 

585  Figure  3.  An  example  of  performing  the  mass  candidate  identification,  (a)  an  original  mammogram, 
(b)  the  detected  breast  boundary  of  (a),  a  mass  is  marked  by  the  arrow,  (c)  the  detected  mass 
candidates  of  (a). 

Figure  4.  An  example  of  obtaining  the  corresponding  ROI  of  a  mass  candidate  on  the  contralateral 
590  mammogram,  (a)  mass  candidate  on  the  left  MLO  view  at  m,  (b)  corresponding  ROI  on  the  right 
MLO  view  at  m 

Figure  5.  An  example  of  obtaining  the  corresponding  ROI  based  on  the  modified  regional 
registration  technique,  (a)  the  nipple  location  (o),  the  shifted  origin  ( n  ),  and  the  mass  candidate  (in), 
595  (b)  corresponding  ROI  on  the  contralateral  mammogram. 

Figure  6.  (a)  Mammogram  containing  a  mass  marked  by  the  rectangular  box.  (b)  A  contralateral 

mammogram  of  (a)  and  the  rectangular  box  is  the  corresponding  ROI  of  the  mass  in  (a)  estimated  by 
the  automated  regional  registration  technique,  (c)  ROI  extracted  from  (a)  containing  a  mass 
600  detected  at  the  prescreening  stage  but  excluded  at  the  final  stage  of  the  unilateral  CAD  system,  (d) 
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The  corresponding  ROI  in  the  contralateral  breast.  Bilateral  analysis  of  this  ROI  pair  increased  the 
likelihood  score  of  the  mass  which  was  then  detected  as  a  TP  in  the  bilateral  CAD  system. 

Figure  7.  (a)  Mammogram  and  the  rectangular  ROI  containing  a  mass  candidate,  (b)  The 

605  contralateral  mammogram  of  (a)  and  the  rectangular  box  is  the  corresponding  ROI  of  the  mass 
candidate  in  (a),  (c)  ROI  extracted  from  (a)  containing  normal  tissue  detected  at  the  prescreening 

stage  and  included  as  an  FP  at  the  final  stage  of  the  unilateral  CAD  system,  (d)  The  corresponding 
ROI  in  the  contralateral  breast.  Bilateral  analysis  of  this  ROI  pair  reduced  the  likelihood  score  of 
the  nonnal  tissue  which  then  became  a  TN  in  the  bilateral  CAD  system. 

610 

Figure  8.  (a)  Image-based  and  (b)  case-based  average  test  FROC  curves  from  the  unilateral  and  the 

bilateral  CAD  systems.  The  FP  rates  were  estimated  from  detection  on  mammograms  in  the  test 
subsets  with  masses. 

615  Figure  9.  (a)  Image-based  and  (b)  case-based  average  test  FROC  curves  from  the  unilateral  and  the 

bilateral  CAD  systems.  The  FP  rates  were  estimated  from  detection  on  mammograms  in  the  no-mass 
data  set. 

Figure  10.  (a)  Image-based  and  (b)  case-based  average  test  FROC  curves  from  the  unilateral  and 

620  bilateral  CAD  systems  for  detection  on  cases  with  malignant  masses  only.  The  FP  rates  were 
estimated  from  in  the  same  data  set. 

Figure  11.  (a)  Image-based  and  (b)  case-based  average  test  FROC  curves  from  the  unilateral  and 

bilateral  CAD  systems  for  detection  on  cases  with  malignant  masses  only.  The  FP  rates  were 
625  estimated  from  the  no-mass  data  set. 
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Figure  1.  The  characteristics  of  our  mass  data  set.  (a)  distribution  of  mass  sizes,  (b)  distribution  of 
mass  shapes,  (c)  distribution  of  mass  margins,  C:  circumscribed,  Ind:  indistinct,  M:  micro lobulated, 
Ob:  obscured,  Sp:  spiculated,  (d)  distribution  of  the  breast  density  in  tenns  of  BI-RADS  category 
630  estimated  by  an  MQSA  radiologist. 
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Figure  2.  Block  diagram  of  the  bilateral  CAD  system  for  mass  detection  on  mammograms. 
635 


Figure  3.  An  example  of  performing  the  mass  candidate  identification,  (a)  an  original  mammogram, 
(b)  the  detected  breast  boundary  of  (a),  a  mass  is  marked  by  the  arrow,  (c)  the  detected  mass 
candidates  of  (a). 
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Figure  4.  An  example  of  obtaining  the  corresponding  ROI  of  a  mass  candidate  on  the  contralateral 
mammogram,  (a)  mass  candidate  on  the  left  MLO  view  at  m,  (b)  corresponding  ROI  on  the  right 
MLO  view  at  m 


Figure  5.  An  example  of  obtaining  the  corresponding  ROI  based  on  the  modified  regional 
registration  technique,  (a)  the  nipple  location  (o),  the  shifted  origin  ( n  ),  and  the  mass  candidate  (m), 
(b)  corresponding  ROI  on  the  contralateral  mammogram. 
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(C)  (d) 


Figure  6.  (a)  Mammogram  containing  a  mass  marked  by  the  rectangular  box.  (b)  A  contralateral 

mammogram  of  (a)  and  the  rectangular  box  is  the  corresponding  ROI  of  the  mass  in  (a)  estimated 
by  the  automated  regional  registration  technique,  (c)  ROI  extracted  from  (a)  containing  a  mass 
detected  at  the  prescreening  stage  but  excluded  at  the  final  stage  of  the  unilateral  CAD  system,  (d) 
The  corresponding  ROI  in  the  contralateral  breast.  Bilateral  analysis  of  this  ROI  pair  increased  the 
likelihood  score  of  the  mass  which  was  then  detected  as  a  TP  in  the  bilateral  CAD  system. 
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(c)  (d) 


Figure  7.  (a)  Mammogram  and  the  rectangular  ROI  containing  a  mass  candidate,  (b)  The 

contralateral  mammogram  of  (a)  and  the  rectangular  box  is  the  corresponding  ROI  of  the  mass 
candidate  in  (a),  (c)  ROI  extracted  from  (a)  containing  normal  tissue  detected  at  the  prescreening 

stage  and  included  as  an  FP  at  the  final  stage  of  the  unilateral  CAD  system,  (d)  The  corresponding 
ROI  in  the  contralateral  breast.  Bilateral  analysis  of  this  ROI  pair  reduced  the  likelihood  score  of 
the  normal  tissue  which  then  became  a  TN  in  the  bilateral  CAD  system. 
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(a)  (b) 

Figure  8.  (a)  Image-based  and  (b)  case-based  average  test  FROC  curves  from  the  unilateral  and 

the  bilateral  CAD  systems.  The  FP  rates  were  estimated  from  detection  on  mammograms  in  the 
test  subsets  with  masses. 


(a)  (b) 

Figure  9.  (a)  Image-based  and  (b)  case-based  average  test  FROC  curves  from  the  unilateral  and 

the  bilateral  CAD  systems.  The  FP  rates  were  estimated  from  detection  on  mammograms  in  the 
no-mass  data  set. 
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(a)  (b) 

Figure  10.  (a)  Image-based  and  (b)  case-based  average  test  FROC  curves  from  the  unilateral 

and  bilateral  CAD  systems  for  detection  on  cases  with  malignant  masses  only.  The  FP  rates  were 
estimated  from  in  the  same  data  set. 


(a)  (b) 

Figure  11.  (a)  Image-based  and  (b)  case-based  average  test  FROC  curves  from  the  unilateral 

and  bilateral  CAD  systems  for  detection  on  cases  with  malignant  masses  only.  The  FP  rates  were 
estimated  from  the  no-mass  data  set. 
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ABSTRACT 

An  important  purpose  of  a  CAD  system  is  that  it  can  serve  as  a  second  reader  to  alert  radiologists  to  subtle  cancers  that 
may  be  overlooked.  In  this  study,  we  are  developing  new  computer  vision  techniques  to  improve  the  detection 
performance  for  subtle  masses  on  prior  mammograms.  A  data  set  of  159  patients  containing  318  current  mammograms 
and  402  prior  mammograms  was  collected.  A  new  technique  combining  gradient  field  analysis  with  Hessian  analysis 
was  developed  to  prescreen  for  mass  candidates.  A  suspicious  structure  in  each  identified  location  was  initially 
segmented  by  seed-based  region  growing  and  then  refined  by  using  an  active  contour  method.  Morphological,  gray 
level  histogram  and  run-length  statistics  features  were  extracted.  Rule-based  and  LDA  classifiers  were  trained  to 
differentiate  masses  from  normal  tissues.  We  randomly  divided  the  data  set  into  two  independent  sets;  one  set  of  78 
cases  for  training  and  the  other  set  of  81  cases  for  testing.  With  our  previous  CAD  system,  the  case-based  sensitivities 
on  prior  mammograms  were  63%,  48%  and  32%  at  2,  1  and  0.5  FPs/image,  respectively.  With  the  new  CAD  system, 
the  case-based  sensitivities  were  improved  to  74%,  56%  and  35%,  respectively,  at  the  same  FP  rates.  The  difference  in 
the  FROC  curves  was  statistically  significant  (p<0.05  by  AFROC  analysis).  The  performances  of  the  two  systems  for 
detection  of  masses  on  current  mammograms  were  comparable.  The  results  indicated  that  the  new  CAD  system  can 
improve  the  detection  performance  for  subtle  masses  without  a  trade-off  in  detection  of  average  masses. 
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1.  INTRODUCTION 

Breast  cancer  is  one  of  the  leading  causes  of  cancer  mortality  among  women1.  Studies  indicate  that  radiologists  do  not 
detect  all  carcinomas  that  are  visible  upon  retrospective  analyses  of  the  images2"8.  Computer-aided  diagnosis  (CAD)  is 
considered  to  be  one  of  the  promising  approaches  that  may  improve  the  sensitivity  of  mammography9, 10. 


An  important  application  of  a  CAD  system  is  to  serve  as  a  second  reader  to  alert  radiologists  to  subtle  cancers  that  may 
be  overlooked.  Masses  retrospectively  seen  on  prior  mammograms  represent  the  difficult  cases  that  are  more  likely  to 
be  missed  by  radiologists.  To  study  the  ability  of  a  CAD  system  in  detecting  subtle  cancers,  one  way  is  to  evaluate  its 
accuracy  in  detecting  missed  cancers  on  prior  mammograms.  Our  previous  experiences  indicate  that  CAD  schemes 
trained  with  cancers  on  current  images  do  not  perform  well  in  detecting  masses  seen  retrospectively  on  prior  images11. 
In  this  study,  we  designed  new  techniques  to  improve  the  detection  performance  for  subtle  masses  on  prior 
mammograms  and  also  evaluated  the  new  CAD  system  on  both  prior  and  current  mammograms  by  comparing  with  our 
previously  developed  CAD  system12. 


2.  MATERIALS  AND  METHODS 


2.1  Materials 

All  mammograms  in  this  study  were  collected  from  patient  files  in  the  Department  of  Radiology  at  the  University  of 
Michigan  with  Institutional  Review  Board  (IRB)  approval.  The  mammograms  were  digitized  with  a  LUMISYS  85 
laser  film  scanner  with  a  pixel  size  of  SO/rmxSO/rm  and  4096  gray  levels.  The  scanner  was  calibrated  to  have  a  linear 
relationship  between  gray  levels  and  optical  densities  (O.D.)  from  0.1  to  greater  than  3  O.D.  units.  The  nominal  O.D. 
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2.2  Methods 


2.2.1  CAD  System  Overview 


Figure  3.  Block  diagram  of  a  single  CAD  system  for  mass  detection  on  mammograms. 


Our  CAD  system  consists  of  five  processing  steps:  1)  pre-screening  of  mass  candidates,  2)  identification  of  suspicious 
objects,  3)  extraction  of  morphological  and  texture  features,  and  4)  classification  between  the  normal  and  the  abnormal 
regions  by  using  rule-based  and  LDA  classifiers.  The  block  diagram  for  the  CAD  system  is  shown  in  Figure  3. 

For  the  pre-screening  stage,  we  developed  a  new  prescreening  technique  in  which  gradient  field  analysis  was  combined 
with  Flessian  analysis  to  identify  mass  candidates.  Both  gradient  field  and  Flessian  analyses  were  designed  to  enhance 
circular  structures  on  mammograms  and  to  suppress  the  objects  with  other  shapes.  Gradient  field  analysis  used  the 
information  of  gradient  field  directions  and  Flessian  analysis  used  the  second  derivatives  by  solving  for  the  eigenvalues 
of  the  Flessian  matrix.  After  this  enhancement  filtering,  the  local  maxima  within  the  breast  region  were  identified  as 
the  mass  candidates  on  each  mammogram.  The  suspicious  structure  in  each  identified  location  was  initially  extracted 
by  a  seed-based  region  growing  method.  An  active  contour  method  was  then  used  to  further  refine  the  initial 
segmentation.  Morphological,  gray  level  histogram  and  run-length  statistics  (RLS)  features  were  extracted  from  the 
original  region  of  interest  (ROI)  and  the  orientation  field  of  the  ROI  for  reduction  of  FPs. 


2.2.2  Training  and  test  CAD  system 

The  hold-out  method  was  used  for  training  and  testing  our  CAD  system.  We  randomly  separated  the  entire  data  set  by 
case  into  two  independent  subsets,  the  training  subset  including  78  cases  with  156  current  and  200  prior  mammograms 
and  the  test  subset  including  81  cases  with  162  current  and  202  prior  mammograms.  The  training  included  selection  of 
proper  parameters  and  features  for  the  classifier  in  the  CAD  system.  Once  the  training  was  completed,  the  parameters 
and  features  were  fixed  for  testing.  The  new  system  was  trained  by  using  prior  mammograms  in  the  training  set  only. 
The  performance  of  the  new  system  was  compared  with  that  of  the  previous  CAD  system  on  the  current  and  prior 
mammograms  in  the  test  set. 

During  training,  feature  selection  with  stepwise  LDA  was  employed  to  obtain  the  best  feature  subset  and  reduce  the 
dimensionality  of  the  feature  space  to  design  an  effective  classifier.  The  detailed  procedure  has  been  described 
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elsewhere13.  Briefly,  at  each  step  one  feature  was  entered  or  removed  from  the  feature  pool  by  analyzing  its  effect  on 
the  selection  criterion,  which  was  chosen  to  be  the  Wilks'  lambda  in  this  study.  Since  the  appropriate  threshold  values 
for  feature  entry,  feature  elimination,  and  tolerance  of  feature  correlation  were  unknown,  we  used  an  automated  simplex 
optimization  method  to  search  for  the  best  combination  of  thresholds  in  the  parameter  space.  The  simplex  algorithm 
used  a  leave-one-case-out  resampling  method  within  the  training  subset  to  select  features  and  estimate  the  weights  for 
the  LDA  classifier.  To  have  a  figure-of-merit  to  guide  feature  selection,  the  test  discriminant  scores  from  the  left-out 
cases  were  analyzed  using  receiver  operating  characteristic  (ROC)  methodology.  The  accuracy  for  classification  of 
masses  and  FPs  was  evaluated  as  the  area  under  the  ROC  curve,  Az.  In  this  approach,  feature  selection  was  performed 
without  the  left-out  case  so  that  the  test  performance  would  be  less  optimistically  biased.  However,  the  selected  feature 
set  in  each  leave-one-case-out  cycle  could  be  slightly  different  because  every  cycle  had  one  training  case  different  from 
the  other  cycles.  In  order  to  obtain  a  single  trained  classifier  to  apply  to  the  hold-out  test  subset,  a  final  stepwise  feature 
selection  was  performed  with  the  best  combination  of  thresholds,  found  in  the  simplex  optimization  procedure,  on  the 
entire  training  subset  to  obtain  the  final  set  of  features  and  estimate  the  weights  of  the  LDA.  Note  that  the  entire 
process  of  feature  selection  and  classifier  weight  estimation  was  performed  within  the  training  subset.  The  LDA 
classifier  with  the  selected  feature  set  was  then  fixed  and  applied  to  the  test  subset. 


2.2.3  Evaluation  methods 

We  used  a  free-response  receiver  operating  characteristic  (FROC)  method  to  assess  the  overall  performance  of  the  CAD 
scheme  on  this  image  set.  An  FROC  curve  was  obtained  by  plotting  the  mass  detection  sensitivity  as  a  function  of  FP 
marks  per  image  as  the  decision  threshold  on  the  LDA  classifier  scores  varied.  The  detected  individual  objects  were 
compared  with  the  “true”  mass  locations  marked  by  the  experienced  radiologist,  as  described  above.  A  detected  object 
was  labeled  as  TP  if  the  overlap  between  the  bounding  box  of  the  detected  object  and  the  bounding  box  of  the  true  mass 
relative  to  the  larger  of  the  two  bounding  boxes  was  over  25%.  Otherwise,  it  would  be  labeled  as  FP.  The  25% 
threshold  was  selected  as  described  in  our  previous  study14. 

FROC  curves  were  presented  on  a  per-image  and  a  per-case  basis.  For  image -based  FROC  analysis,  the  mass  on  each 
mammogram  was  considered  an  independent  true  object;  the  sensitivity  was  thus  calculated  relative  to  the  number  of 
visible  masses  by  image,  which  was  149  and  151,  respectively,  for  the  current  and  prior  test  subset.  For  case-based 
FROC  analysis,  the  same  mass  imaged  on  the  two-view  mammograms  was  considered  to  be  one  true  object  and 
detection  of  either  or  both  masses  on  the  two  views  was  considered  to  be  a  TP  detection;  the  sensitivity  was  thus 
calculated  relative  to  the  number  of  masses  by  case,  which  was  81  and  90,  respectively,  for  the  current  and  prior  test 
subset.  The  test  FROC  curve  for  a  given  mass  subset  was  estimated  by  counting  the  detected  masses  on  the  test  mass 
subset  for  the  sensitivity.  The  FP  marker  rate  was  estimated  from  FPs  detected  in  the  same  test  subset.  The  average 
number  of  FP  marks  per  image  produced  by  the  CAD  system  at  a  given  sensitivity  was  estimated  by  counting  the 
detected  objects  in  these  cases  at  the  corresponding  decision  threshold. 

In  order  to  compare  the  performance  of  our  CAD  systems  statistically,  we  employed  the  alternative  free-response  ROC 
(AFROC)  method15.  In  the  AFROC  method,  the  FROC  data  are  first  transformed  by  counting  the  number  of  false¬ 
positive  images  (FP1)  instead  of  the  FPs  per  image.  The  LDA  score  of  an  FPI  is  determined  by  the  FP  object  with  the 
highest  score  on  the  image  regardless  of  how  many  lower  scores  FP  objects  are  made  on  the  same  image.  The  ROCKIT 
curve  fitting  software  and  statistical  significance  tests  for  ROC  analysis  developed  by  Metz  et  al.  16  can  then  be  used  to 
analyze  the  AFROC  data. 


3.  EXPERIMENTAL  RESULTS 

Figures  4  and  5  showed  the  image-based  and  case-based  FROC  curves  for  detection  of  masses  on  prior  mammograms, 
respectively.  The  case-based  sensitivities  for  detection  of  masses  on  the  prior  mammograms  (typically  subtle  masses) 
in  the  test  subset  were  56%,  and  35%  at  1  and  0.5  FPs/image  by  using  the  new  CAD  system  in  comparison  to  48%,  and 
32%  at  the  same  FP  rates  by  using  the  previous  CAD  system.  The  improvement  with  the  new  system  on  prior 
mammograms  was  statistically  significant  (p  =  0.036).  When  the  new  system  was  applied  to  the  detection  of  masses 
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on  the  current  mammograms  (typically  average  masses)  in  the  test  subset,  the  case-based  sensitivities  were  77%  and 
70%  at  1  and  0.5  FPs/image  in  comparison  to  75%  and  56%  at  the  same  FP  rates  by  using  the  previous  CAD  system. 
The  difference  in  the  two  FROC  curves  for  detection  of  average  masses  on  current  mammograms  was  not  statistically 
different  (p  =  0.184).  Image-based  and  case -based  FROC  curves  for  detection  of  masses  on  current  mammograms 
were  shown  in  Figures  6  and  7,  respectively. 
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Figure  4.  Image-based  test  FROC  curves  on  prior 
mammograms.  Old  CAD:  detection  by  the  previous 
CAD  system  trained  on  both  current  and  prior 
mammograms.  New  CAD:  detection  by  the  CAD 
system  trained  on  prior  mammograms. 


Figure  5.  Case-based  test  FROC  curves  on  prior 
mammograms.  Old  CAD:  detection  by  the  previous 
CAD  system  trained  on  both  current  and  prior 
mammograms.  New  CAD:  detection  by  the  CAD 
system  trained  on  prior  mammograms. 


Figure  6.  Image-based  test  FROC  curves  on  current 
mammograms.  Old  CAD:  detection  by  the  previous 
CAD  system  trained  on  both  current  and  prior 
mammograms.  New  CAD:  detection  by  the  CAD 
system  trained  on  prior  mammograms. 


Figure  7.  Case-based  test  FROC  curves  on  current 
mammograms.  Old  CAD:  detection  by  the  previous 
CAD  system  trained  on  both  current  and  prior 
mammograms.  New  CAD:  detection  by  the  CAD 
system  trained  on  prior  mammograms. 
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Table  1.  Estimation  of  the  statistical  significance  in  the  difference  between  the  FROC  performances  of  the  previous  CAD 
system  trained  on  both  current  and  prior  mammograms  and  the  proposed  CAD  system  trained  on  prior  mammograms. 


A,  (AFROC) 

Current  Test  Set 

Prior  Test  Set 

Old  CAD 

0.51 

0.26 

New  CAD 

0.50 

0.31 

p-value 

0.184 

0.036 

4.  DISCUSSION  AND  CONCLUSIONS 

In  this  study,  we  improved  the  accuracy  of  a  CAD  system  for  detection  of  subtle  masses  on  prior  mammograms.  A 
new  prescreening  method  was  developed  to  improve  the  sensitivity  of  mass  detection.  A  new  mass  segmentation 
method  that  combined  a  seed-based  region  growing  method  with  active  contour  method  was  also  designed.  RLS 
features  were  extracted  from  the  original  ROIs  and  the  newly  derived  orientation  field  of  the  ROIs  for  FPs  reduction. 
Our  CAD  system  can  significantly  improve  the  performance  of  mass  detection  on  prior  mammograms  without  a  trade¬ 
off  in  the  detection  of  masses  on  current  mammograms.  It  is  expected  that  the  new  CAD  system  can  increase  the 
overall  accuracy  for  detection  of  subtle  early-stage  breast  cancers. 
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